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PREFACE TO THE 
FIRST EDITION 


JVLtb,* theory, or more generally linear algebra, is a relatively recent 
mathematical development. Its roots extend back 100 years to the work of 
Hamilton, Cayley, and Sylvester, but it has attracted widespread interest 
only in the past two or three decades. Today matrices are effective tools in 
quantum theory as well as classical mechanics; in aeronautical, mechanical, 
and electrical engineering; in statistics and linear programming and therefore 
in all the social sciences which these theories serve. 

Even a cursory glance at current mathematical literature reveals that 
matrix theory is in a stage of active growth. It is rather surprising therefore 
that the mathematical background required for an understanding of matrix 
theory is sufficiently modest that a substantial first course can be mastered 
by undergraduates. The major prerequisite is not a specific list of courses in 
mathematics, but rather the ability to reason abstractly, to proceed logically 
from hypothesis to conclusion. Anyone who possesses this quality, even la- 
tently, is capable of understanding the material presented here, whether he 
be an economist, psychologist, engineer, chemist, physicist, or mathematician. 
The necessary mathematical background is normally acquired in one or two 
years of college mathematics. 

Courses in matrix theory are currently presented in several ways. A com- 
putational course can be offered in which the calculations themselves are 
emphasized more than their meaning. Alternately, the study of matrices can 
be motivated from the familiar problem of solving systems of linear equa- 
tions, the important connection between matrices and linear transformations 
being deferred until the end of the course. A third procedure, and the one 
chosen here, systematically employs the elegant techniques of abstract algebra 
to develop simultaneously the algebra of matrices and the geometry of linear 
transformations. 

Although an axiomatic approach places a greater burden at the outset of 
the course on both the instructor and the student, the ultimate gain in under- 
standing linear algebra, and indeed modern mathematics as a whole, amply 
justifies this extra effort. Undeniably, the understanding of abstract concepts 
is aided by frequent illustrations from familiar contexts, and a serious at- 
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tempt has been made here to retain firm contact with concrete ideas while 
steadily developing a higher degree of generality. 

The obvious purpose of this book is to present a lucid and unified intro- 
duction to linear algebra at a level which can be understood by undergradu- 
ates who possess reasonable mathematical aptitude, and thus to lay a solid 
foundation from which each student can apply these notions and techniques 
to the field of his interest. A more subtle but equally serious objective is to 
prepare the student for advanced scientific work by developing his powers of 
abstract reasoning. Linear algebra is admirably suited to this purpose. 

The present text is a revised form of mimeographed lecture notes written 
in 1951 for the second semester of a survey course in abstract algebra at 
Kenyon College. Because linear algebra is a cohesive body of knowledge 
which blends a variety of algebraic and geometric concepts, I believe that it 
provides a more natural introduction to abstract algebra than does the usual 
survey course. The first chapter, supplemented by Appendix A, presents the 
algebraic notions needed for this study; thus the text can be used by students 
who have no previous experience with modern algebraic techniques. 

The remaining material has been selected and arranged to proceed directly 
to the problem of equivalence relations and canonical forms. The duality of 
geometry and algebra is emphasized, but computational aspects are not ig- 
nored. Metric notions are deferred until Chapter 9, principally because they 
are not essential for the major part of this work. Depending upon the ability 
of the class, there is adequate material for a course of 60 class hours. Chapter 
10, and to a lesser extent Chapter 9, may be omitted without interfering 
seriously with the major aims of this presentation. 

Since some results of general interest are stated as exercises, each student 
is urged to make a practice of reading all exercises and learning the facts 
stated therein, even though he might choose not to prove all of them. 

Revision of the original notes was performed during my tenture as a Science 
Faculty Fellow of the National Science Foundation. I am greatly indebted 
to the Foundation for its support, to Princeton University for the use of its 
facilities during this period, and to Kenyon College for assistance with the 
original version. Personal gratitude is expressed to J. G. Wendel for mathe- 
matical discussions extending over a decade, to A. W. Tucker for making 
available his recent work on combinatorial equivalence, to W. D. Lindstrom 
for reading critically selected parts of the manuscript, and to the editors and 
advisers of the publisher for many helpful suggestions. Appreciation is ex- 
pressed also to Mrs. Richard Anderson and Mrs. Charles Helsley for their 
help in preparing the original and revised manuscripts. 


February , 1960 


D. T. FINKBEINER 
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CHAPTER 1 


Abstract Systems 


§ 1 . 1 . Introduction 

When a student begins the study of matrices he soon discovers properties 
which seem to have no counterpart in his previous mathematical experience. 
This is as unavoidable as it is exciting. The concepts he considers and the 
methods he employs arc markedly different from those he has encountered 
in mathematics through school and into college. 

Elementary school mathematics is concerned primarily with the arithmetic 
of number systems, beginning with the positive integers and developing grad- 
ually to include all of the integers and the rational numbers. The later use of 
letters to represent numbers is a real stride toward abstraction, and the 
corresponding study is called algebra instead of arithmetic. The algebraic 
problem of solving quadratic equations reveals the need for still more com- 
prehensive number systems, which leads to the study of real and complex 
numbers. 

An exception to the emphasis on numbers occurs in plane geometry, where 
the elements studied are called points and lines rather than numbers and 
equations, and where the relations between geometric figures are no longer 
numerical equality or inequality but congruence, similarity, parallelism, or 
perpendicularity. Geometry normally is the student's first excursion from 
the world of numbers to a realm of deductive thought which is essentially 
nonnumerical. 

Trigonometry and analytic geometry provide a bridge between numerical 
and geometric concepts by using numbers and equations to describe geometric 
figures. The objects studied are geometric, but the methods used in investi- 
gating their properties are numerical. This gradual transition from the study 
of numbers to the study of nonnumerical elements is continued in calculus, 
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in which functions and operations on functions are described numerically but 
have significant geometric interpretations. 

Likewise, matrices are described numerically and have important geometric 
interpretations. Matrices form a type of number system, and in this study 
we shall be concerned with developing the algebraic properties of this system. 
However, since matrices are intrinsically related to geometry, while their re- 
lation to arithmetic is comparatively superficial, a geometric interpretation 
of matrix theory is both natural and efficient. After this introductory chapter 
we turn immediately to a description of the geometric setting for our later 
work, finite-dimensional vector spaces. In Chapter 3 we study linear transforma- 
tions of such spaces, and not until Chapter 4 do we study matrices themselves. 

It is evident from the Table of Contents that various mathematical sys- 
tems arc discussed in this book, many of which may now be unfamiliar to 
you. Therefore, we shall begin by describing mathematical systems generally. 
In order to suggest that our present interest lies primarily in the internal 
structure of the system, rather than in its relation to the physical world, we 
speak of an obstruct system. You are asked not to interpret the word “abstract” 
as meaning that familiar systems will not arise as particular examples of 
abstract systems. As we shall see, an abstract system is often defined as a 
synthesis of some concrete system. 

This introductory chapter, which discusses abstract systems from the point 
of view of modern algebra, has three objectives: to introduce the basic nota- 
tion used in this book, to extend and modernize the student’s mathematical 
vocabulary, and to capture some of the spirit of abstract mathematics. 

To most mathematicians the esthetic appeal of an abstract system is suffi- 
cient justification for its study. From a more practical standpoint, the investi- 
gation of abstract systems has greatly clarified and unified the fundamental 
concepts of mathematics, and has provided a description of such important 
notions as relations, functions, and operations in terms of the simple and 
intuitive notion of set. Such a description is given in Appendix A. If you 
have a flair for abstraction or a desire for more precision than is afforded in 
this introductory chapter, you may wish to study Appendix A immediately. 
Or you may prefer to proceed more gradually, deferring a general study of 
basic concepts until you have acquired some experience with these ideas in 
the specific systems considered in this book. In either event, you should not 
expect to attain immediate and comfortable familiarity with all of the ideas 
presented here. You should study the text material carefully and work 
through the examples and exercises in detail, steadily increasing your facility 
for abstract thought and enhancing your insight into the nature of mathe- 
matics. 
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Exercises 

1. Assume that a club of two or more students is organized into committees 
in such a way that each of the following statements (postulates) is true. 

(a) Every committee is a collection of one or more students. 

(b) For each pair of students there is exactly one committee on which 
both serve. 

(c) No single committee is composed of all the students in the club. 

(d) Given any committee and any student not on that committee, there 
exists exactly one committee on which that student serves which has no 
students of the first committee in its membership. 

Prove each of the statements (theorems) in the following sequence, 
justifying each step of your proof by appealing to one of the five postulates 
or to an earlier theorem of the sequence. (Although this system may appear 
more concrete than abstract, its general nature is indicated by Exercise 2 of 
this section and Exercise 2 of § 1.2.) 

(i) Every student serves on at least two committees. 

(ii) Every committee has at least two members. 

(iii) There are at least four students in the club. 

(iv) There are at least six committees in the club. 

2. (i) Translate the description of the system of the previous exercise into 
geometric language by calling the club a “geometry, ” a student a “point,” 
and a committee a “line,” and by making other changes as needed to carry 
out the geometric flavor without changing the inherent meaning of the 
statements. 

(ii) Translate the four theorems into geometric language, and similarly 
translate your proof of the first theorem. 

(iii) Find the geometric system with the smallest number of points which 
satisfies the four postulates. (Such a system is called a finite geometry ; other 
theorems about the system can be discovered and proved.) 

(iv) Which of the four postulates are consistent with the axioms of 
Euclidean plane geometry? 


§1.2. Sets 

Before we attempt to describe abstract systems, we should first recognize 
that it is quite hopeless for two individuals to try to carry on an intelligent 
conversation unless they share some basic knowledge. We now state two 
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general assumptions about the basic knowledge that is used as a foundation 
for our later discussion: 

There is a common understanding of the basic language which we shall use 
to define other terms. 

There is a common understanding of the system of logical reasoning with 
which we proceed from hypothesis to conclusion. 

These assumptions are so general that they now appear to be quite elusive; 
presently we shall be more specific. In effect, the first assumption recognizes 
the futility of trying to define all terms which will be needed, and thereby 
establishes the need for undefined terms. The second assumption is a declara- 
tion that no formal treatment of the rules of deduction will appear in this book. 

More specifically, as part of our basic language we assume an intuitive 
understanding of the notion of a set. The word set is used to denote a collection , 
class , family , or aggregate of objects, which are called elements or members of 
the set. It is evident that this explanation does not define a set or the concept 
of membership in a set. These concepts are part of the undefined language 
which is assumed for this book. 

It is customary to denote a set by a capital letter and the elements of a set 
by small letters. To denote that an element b is a member of a set S, we write 

b e S, 

which can also be read as “ b belongs to *S.” To denote that b does not belong 
to <S we write 

b$S. 

There are two common ways of specifying a set: 
by listing all its elements within braces, 

by stating a characteristic property which determines whether or not any 
given object is an element of that set. 

The notation which we adopt for the second method comprises two parts, 
separated by a vertical line, within braces. The first part tells us what type 
of elements are being considered, and the second part specifies the character- 
istic property. For example, let / denote the set of all integers, and let S 
denote the set of all integers whose square is less than seven. The two methods 
of writing S are 

S- {-2, -1,0, 1,2}, 

where we attach no importance to the order in which the elements are listed, 
and 

S = {x e / | x 2 < 7}, 

which is read, “S is the set of all integers x such that z * is less than 7.” 

We use the symbol $ to denote the void or empty set, which contains no 
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elements at all. While the void set may seem at first, to be an artificial notion, 
its acceptance as a bona fide set is convenient. 

Sets S and T are said to be equal if and only if they contain exactly the 
same elements. In terms of the membership relation, this is expressed as: 

T = S means “x e T if and only if x G S” 

The next concept is that of a subset. US and T are sets, T is said to be a 
subset of S, written T C S, if and only if every element of T is an element of 
S; that is, 

TCS means “if x e T f then x e S” 

The subset notation T C $ can also be written S T. This notation is 
analogous to the notation for inequality of numbers: a < b means the same 
as b > a. It is readily verified that equality of sets can be expressed in terms 
of the subset notation as: 

T = S means “T C S and 5 C 77’ 

It is clear from the definition of subset that any set, is a subset of itself. 
Also, since the void set has no elements, the statement “If x G <J>, then 
x G S” is logically valid for any set S. Thus we have 

<l> Q S and S C S for every set S. 

T is said to be a 'proper subset of S if and only if every element of T is an 
element of S, but not every element of S is an element of T. This is written: 

T c S if and only if T C S and T ^ S. 

In practice it is sometimes useful to adopt geometric language for sets, 
calling an element a point even though the element may have no obvious 
geometric character. When we adopt such descriptive language we must 
bear in mind that this is done only for convenience, and we must carefully 
refrain from assuming any properties which 
may be suggested by our language but which 
are not otherwise legitimately established. 

A geometric interpretation of sets suggests 
the use of sketches, called Venn diagrams, to 
represent sets and relations between sets. Thus 
the subset relation T C S can be shown graph- 
ically as in Figure LI. Such diagrams provide 
insight concerning sets, and often they suggest methods by which statements 
about sets can be proved or disproved. However, diagrams are not valid sub- 
stitutes for formal proofs. 

We now turn our attention to several ways in which sets can be combined 
to produce other sets. Let S be any set, and let K denote the collection of 
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all subsets of S. (The elements of K are the subsets of S.) We shall define three 
operations on sets, called union , intersection , and complementation: 

Union: A U B = [x G S | x e A or x e B} } 

Intersection: A fl B - {a: G S | x G A and x e B}, 

Complementation: A' = {x G S | x £ A}. 

Thus if A and B are any elements of K, the set “A union B” consists of all 
elements of S which belong to A, or to B } or to both. The set “A intersec 
tion B” consists of all elements of S which belong to both A and B. The 
“complement of A in S” consists of all elements of 8 which do not belong to A . 
If A D B = 4> (the void set), we say that A and B are disjoint. Venn diagrams 
which illustrate these operations are shown in Figure 1.2. More generally, 
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the union and intersection of any family F of subsets of S are defined in an 
analogous way: 

\J A = {j G S | x G A for some A e- F \ , 

A£F 

n A = {r (= S | x g A for all A e F}. 
acf 

Having made these definitions, we now have before us a concrete example 
of an abstract system, the general nature of which will be discussed in § 1.5 
To pave the way for that description, it will be useful to recognize that we 
have been developing an algebraic system in which the objects of our attention 
are the various subsets of a fixed set S. We have defined two relations for 
subsets, the equality relation and the subset relation. Furthermore, we. have 
introduced two methods, union and intersection, by which any two subsets 
of S may be combined to produce another subset of S. A third operation, 
complementation, provides a means by which each subset of S determines 
some other subset of S. If we chose to do so, we could now undertake a 
thorough investigation of the properties of this system, proving theorems 
about the algebra of sets. A few such theorems are listed below as exercises, 
while a more complete and systematic discussion is given in Appendix A, § A.6. 
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Exercises 

1. (i) List all the subsets of S = {a, b, c, d). 

(ii) If S is a finite set having m elements, how many subsets does S 
have? Prove your answer. 

2. Referring to Exercise 1, § 1.1, translate the description of the system 
into the abstract language and notation of sets. Also translate the four 
theorems (but you need not prove them again). 

3. Let A and B be arbitrary subsets of a given set S. Use the definitions of 
set theory to prove the following theorems. 

(i) A C B if and only if A U B = B. 

(ii) A U {B n C) = (A U B) D (A U C). 

(iii) (A U B)' = A' n B'. 

4. The difference of two subsets of S is defined by 

A — B = {x e A | x € B) . 

(i) Find and prove an equation which expresses A — B in terms of the 
basic set operations (U, l"l, and ')• Draw an illustrative Venn diagram. 

(ii) Prove: (A - B)' - A' U B. 

(iii) Is set difference a commutative operation? That is, are A — B and 
B — A equal? Prove your answer. 

5. The symmetric difference A © B of two subsets of S is the set of all 
elements which belong either to A or to B, but not to both A and B. 

(i) Find and prove an expression for A © B in terms of the basic set 
operations (U, D, and ')• Draw an illustrative Venn diagram. 

(ii) Prove that (A © B)' = (A' U B) n (A U B'). 

(iii) Is symmetric difference a commutative operation? That is, are 
A © B and B © A equal? Prove your answer. 

(iv) Discover a simple description of the set (A © B) © C. 

(v) Is symmetric difference an associative operation? That is, are 
(A © B) © C and A © (B © C) equal? Prove your answer. 

6. Show that set union can be expressed in terms of intersection and 
complementation. 

7. Considering only finite sets, let n(S) denote the number of elements in S. 

(i) Prove n(A U B) = n(A) + n(B) — n(A fl B). 

(ii) Discover and prove a similar result for n(A U B U C), 
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§1.3. Relations 

Given any set S, it is often necessary to know how the elements of S are 
related, one to another. Two objects might be of the same color, but one 
might be heavier than the other. Line L might intersect line M but be parallel 
to line N, In general, then, we wish to consider all ordered pairs (a, b) of 
elements of S, and to have some means of distinguishing certain of those 
pairs from other pairs. The pairs which are thus distinguished constitute a 
binary relation on S f and we say that a is related to b if and only if the pair 
(<z, b) is a distinguished pair. 

To proceed more formally, we first define the cartesian product of a set S 
with itself to be the set S X S of all ordered pairs of elements of S : 

8 X S = {(a, b) | a e S and b e S}. 

Two pairs are equal if and only if their first components are the same and their 
second components are the same. Then a binary relation R on *S is simply 
a subset of S X S. If (a, b) E R we say that a is related to b by the relation R, 
and we write a R b. If (a, b ) £ R, we write atib. 

Examples 

(a) If R is the set of all real numbers, representing the points on a real 
coordinate axis, then R X R is the set of all ordered pairs of real numbers, 
representing the points on a real coordinate plane. The relation of equality 
is described by the set {(a, a) \ a E R}, consisting of all points on the line 
y *= x. The relation a < b is represented by all points which lie above the 
line y = x. 

(b) If 7 is the set of integers, 7 X / is represented by the set of all points 
of the plane for which both coordinates are integers, called “lattice points.” 
The relation “a — b is evenly divisible by 3,” written a = b (mod 3), is rep- 
resented by all lattice points which lie on any line of slope 1 through the 
points (3n, 0), n € 7. 

In linear algebra, as elsewhere throughout mathematics, we are particularly 
interested in binary relations which have the abstract properties of equality. 
These are 

Reflexivity: a R a for every a e S, 

Symmetry: if a R b then b R o, 

Transitivity: if a R b and b R c, then oRc. 

Any relation R on S which has these three properties is called an equivalence 
relation . 

One of the major investigations that we shall undertake in this book con- 
cerns a description of various equivalence relations which arise in a natural 
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manner in the study of matrices. Therefore, before reading Chapter 0 you 
will need to become thoroughly familiar with the concept of binary relations 
and especially equi valence relations. These notions are developed more gen- 
erally and completely in Appendix A. 

Exercises 

1. Given the set S = {1, 2, 6, 8}. 

(i) Represent *S X S geometrically. 

(ii) Represent geometrically the* subset of S X S which is determined 
by the relation “a divides b evenly” for a, b e S. 

(iii) Represent geometrically the subset of S X & which is determined 
by the relation “a > />” for a, b e S. 

2. Show that if S has more than three elements, then S X S has more than 
00,000 subsets. 

3. Since the void set and A X A arc subsets of A X A } each represents 
a relation on A. Describe each of these two relations. 

4. Let I be the set of all positive integers. For any fixed n e / we define 
on / the relation congruence modulo n by writing 

a = b (mod n) 

if and only if a — b is divisible by n. 

(i) Prove that this is an equivalence relation on /. 

(ii) Prove that if a = b (mod n) and if c = d(mod n), then 

a + c as b + d(mod n) and ac s bd ( mod n). 

(iii) Show by an example that if c ^0(mod n) and if ac m bc(mod n ), 
it does not necessarily follow that a = b (mod n). 

§ 1 . 4 . Functions and Mappings 

It is assumed that you are familiar with numerical functions from your 
previous mathematical experience; our purpose here is to describe functions 
in a more general setting. Recall that each numerical function / has a domain 
of definition (a set D of numbers), and that to each x e D the function / 
assigns a unique number f(x), which is called the value of / at x. As x varies 
over D, the numbers /(x) form a set R which is called the range of/. 

A generalization from numerical functions to arbitrary functions is made 
very easily — we simply drop the requirement that D and R be sets of numbers, 
and consider them to be abstract sets. 
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A function F from a set A into a set B consists of 

1. a nonvoid subset D Q A, called the domain of F, 

2. a correspondence such that to each a e D there is associated one and 
only one b G B. 

The element b G B which is associated with a £ D by the function F is 
often denoted F(a). The range of F is the subset R C B, defined by 
72= {b G B | b = F(a) for some a G D). 

If R = B y we say that F is a function from A onto B . 

Here again a translation to geometric language is useful. The domain D 
and range R are sets of points, and the function F associates with each point 
of D exactly one point of R. 

A reasonable geometric synonym of function is mapping, a terminology 
suggested by Figure 1.3. Likewise, the point F{x) is called the image of x 

r (x l ) = rU l ) 


D = dom F R ~ range F 

Figure 1.3 

under the mapping F. For functions of abstract sets this geometric terminology 
is more descriptive than that used for numerical functions. There is also a 
useful notational change which we shall adopt — instead of denoting the image 
of x under F by the functional notation F(x ), we shall omit the parentheses 
and show F in boldface type to the right of x. In this new notation a function 
will be indicated thus: 

F:x — xF, for all x G dom F. 

Thus for each x G dom F the symbol xF denotes the image of x under the 
mapping F. 

It is important to observe that although a mapping F assigns a unique 
image xF to each x g dom F, it is quite possible that a point of range F is 
the image of more than one point of dom F; that is, from the equation XiF = x 2 F 
we cannot deduce that xi = x 2 . Therefore, mappings in general are many - 
to-one , in the sense that many distinct points of the domain may be mapped 
into the same point of the range. Just as with numerical functions, if it 
happens that each point of range F is the image of exactly one point of dom F, 
then an inverse mapping F* can be defined from range F onto dom F, such 
that if y = xF G range F, then 

yF* = (xF)F* ** x for every x G dom F. 
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Figure 1.4 

In summary, wo say that a mapping; F is one-to-one (or reversible) if and 
only if .rjF = .r 2 F implies X\ = j* 2 . If F is one-to-one,’ then an inverse func- 
tion F* can be defined from range F !o dom F such that for each x € dom F, 
(xF)F* = x. 

The equation (;rF)F* = x stales that if x is mapped first by F and if the 
image xV is Ihen mapped by F*, the resultant image is x itself. This is a 
special instance of the more general concept of successive mappings. Suppose 
that A, and C are sets, that F is a mapping from A into li, and that G is a 
mapping from li into C. Whenever the range of F is a subset of the domain 
of G, we can define a direct mapping FG from A into C as 

FG:a: — >- jFG = (jF)G for all x e dom F. 



The concept of composite function is very important in linear algebra 
because matrix multiplication can be interpreted in terms of successive map- 
pings of vector spaces into vector spaces. The right-hand notation which we 
have adopted for functions places the symbol for a mapping to the right of 
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the symbol for the object which is mapped, a convention which is particularly 
convenient for composite functions. Thus, the image of x under the composite 
mapping “F followed by G” is denoted by xFG, the individual symbols 
written from left to right in their natural order of occurrence. By contrast, 
in the familiar left-hand notation which is used for functions in introductory 
analysis the corresponding notation is G(F(x)) t and the term “function of a 
function” is used. As we shall see later when we represent linear mappings by 
matrices, there are important differences in the form of that representation, 
depending upon whether mappings are written in right-hand or left-hand 
notation. Both systems are widely used. 

Exercises 

1. Let $, T, U, and V be sets, and let F, G, and H be mappings whose 
domains are S y T, and U y and whose ranges are subsets of T, U, and V , 
respectively. 

(i) Describe the domain and range of FG and GH. 

(ii) Show that F(GH) = (FG)H. 

(iii) Suppose that F and G are both reversible (one-to-one). Show that 
(FG)* - G*F*, and that (F*)* = F. 

(iv) Suppose U = S. Describe the domain and range of FG and GF. 
Are these two functions equal? Explain. 

(v) Suppose U = T = S. Are FG and GF equal functions? Explain. 

2. Let F and G both be mappings from the real numbers to the real num- 
bers. Since multiplication of real numbers is defined, a product FOG of 
mappings can be defined by 

x(F O G) = (xF)(xG). 

(The symbol O is used to prevent confusion with the successive application 
of mappings, FG, as in Exercise 1.) Prove F O G = G 0 F. 

3. The zero mapping O is defined by xO = 0 for all real x. Let 

xF = |x| + x, 
xG = |x| — x. 

Prove F s* O, G ^ O, but F O G = O, where O is defined as in Exercise 2. 


§1.5. Abstract Systems 

We are now ready to describe the nature of an abstract system. First a 
system must have a nonvoid set of elements } the building blocks of the system. 
These elements are regarded as having no properties other than those which 
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are prescribed by the system, even though we might use geometric terms or 
other descriptive names for them. 

Example 

To illustrate this discussion we shall use the specific example of the collec- 
tion K of all subsets of a given set S. The system which we describe in this 
example has K as its set of elements. Thus an element of the system is a 
subset of S, not an element of *S. 

Next we need a set of relations between elements. One of the relations must 
be a definition of equality, which provides a criterion for distinguishing one 
element from another. Generalized forms of equality, called equivalence rela- 
tions, play a vital role in matrix theory. (See § 0.1 and Appendix A.) 

Example 

Equality of two elements of K means that they are the same subset of S. 
Furthermore, a second relation, denoted C ( is defined between various ele- 
ments of K. 

A system will also include a set of operations, ways of combining elements 
to produce other elements of the system. Since we are working entirely within 
a given system, we require that the result of an operation on any elements 
of the system be an element of that system. Such an operation is called a 
closed operation , and we consider only operations with this property. An 
operation which combines any two elements of a system to produce a single 
element of the system is called a binary operation. Similarly, a unary opera- 
tion maps each single element of the system into a corresponding element of 
the system. 

Example 

For the elements of K we list three operations: union, intersection, and 
complementation. Union and intersection combine two elements of K to 
produce an element of K , and therefore can be regarded as mappings of 
K X K into K. Complementation operates on any element of K to produce 
another element of K, and thus can be considered as a mapping of K into K. 

To endow the elements, relations, and operations of a system with desired 
properties, a set of postulates is prescribed. These are statements or axioms 
which are assumed to be valid for the system. Indeed, the postulates are 
initially the only description of the system. 

From the postulates certain deductions can be made by means of the rules 
of logic which are part of the basic knowledge underlying the system. These 
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deductions are theorems of the system and, once proved, they possess the 
same validity in the system as do the original postulates. 

Another important feature of a system is its set of definitions , which are 
agreements concerning terminology for concepts constructed from the skeleton 
of the system. By means of definitions, new relations and operations can 
be introduced, or elements with special properties can be given distinctive 
names. As theorems are proved and the known facts of the system grow in 
number and complexity, a proper use of definitions helps to classify the infor- 
mation of the system and to simplify its internal language. 

A brief discussion of the postulates, theorems, and definitions of the system 
of all subsets of a given set is contained in § A.G. 

In summary, an abstract system S consists of 

a set E of elements, 
a set R of relations, 
a set 0 of operations, 
a set of postulates, 
a set of theorems, 
a set of definitions. 

Of course, the heart of the system is the set of postulates, and all else is 
derived from the postulates. For notational purposes, however, it is con- 
venient to emphasize the elements, relations, and operations by writing 

S = { E ; R ; 0} 

to denote the system. Here E denotes a set of elements, R a set of relations, 
and 0 a set of operations. 

When we speak of an element x of a system S, we refer to a member of E. 
However, it is customary in mathematical literature to ignore the distinction 
between the system itself and the set of elements of the system; thus we shall 
often write x e S when, strictly speaking, we mean x e E. 

Most of the systems which we shall study are defined by postulates which 
describe the elements and the operations. A few of the postulates occur in so 
many different systems that it is convenient to assign special terminology to 
them. To do so, we suppose that cS is an abstract system whose set of elements 
is denoted by E. For the sake of clarity you might wish to interpret the 
following discussion in terms of a familiar example, such as the system of 
integers with addition and multiplication as operations, or the system of 
subsets of a set with union and intersection as operations. 

First we consider three properties of binary operations. 

Associativity : The operation * is associative if and only if for all a, b, c e E, 
a * (b * c) = (a * b) * c. 
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Commutativity: The operation * is commutative if and only if for all a, b e E, 

a * 6 = b * a. 

Distributivity: The operation O is distributive over the operation * if and 
only if for all a, b, c e E y 

a Q (b * c) = (a O 6) * (a O c), 

(6 * c) O a = (b O o) * (c O a). 

We say that O is /(//-distributive over * if the first of these two conditions 
is satisfied, and that O is n/b/-distributivc over * if the second condition is 
satisfied. 

Observe that for integers, both addition and multiplication are associative 
and commutative, and multiplication is distributive over addition. For sub- 
sets, both union and intersection are associative and commutative, and each 
operation is distributive over the other. 

Postulates about elements most often assert the existence of elements with 
unusual properties. We give three examples of such elements. 

Identity element: An element i e E is called an identity relative to the 
operation * if and only if, for every x £ E, 

i*x = x = x*i. 

Inverse clement: Given an identity relative to * and an arbitrary element 
x £ E, an element x' is called an inverse of x relative to * if and only if 

x * x f = i = x f * x. 

Idempotent element: An element x £ E is said to be idempotent relative to * 
if and only if 

x * x = x. 

In the system of integers, 0 is an identity of addition, and 1 is an identity 
of multiplication. Both are idempotent relative to multiplication, but 0 is the 
only integer which is idempotent relative to addition. Each integer x has an 
additive inverse, — x, but 1 and —1 are the only integers that have a multi- 
plicative inverse which is an integer. 

In the system of subsets of S, the void set $ is an identity of set union 
and S is an identity of set intersection. Every set is idempotent relative to 
both union and intersection. The void set <f> is self-inverse relative to union, 
and S is self-inverse relative to intersection, and these are the only sets which 
have either type of inverse. 

Exercises 

1. Prove the following about an abstract system S = {£;*,©}, assuming 
no properties of the system other than those stated for each case. 
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(i) Each operation can have at most one identity element. 

(ii) If * is associative and has an identity i e E, then each x G E can 
have at most one inverse element relative to *. 

(iii) If an identity clement of an operation exists, it is idempotent rela- 
tive to that operation. 

(iv) If O is distributive over *, if i is an identity of O, and if e is an 
identity of *, then c is idempotent relative to both O and *. Is i idempotent 
relative to *? 

Interpret these results for the special case of addition and multiplication of 
real numbers. 

2. In each case below determine whether or not the given function describes 
a closed operation on the given set. 

(i) Multiplication of even integers. 

(ii) Multiplication of odd integers. 

(iii) Addition of odd integers. 

(iv) Addition of even integers. 

(v) Multiplication of real functions which are differentiable. 

(vi) Differentiation of real functions which arc differentiable. 

(vii) Differentiation of polynomials. 

3. Let P be the set of points of the plane. For p } q e P define p * q to be 
the midpoint of the segment from p to q . 

(i) Is * a closed operation on P? 

(ii) Is * commutative? 

(iii) Is * associative? 

Substantiate your answers. 

4. Let a mapping Ti of the points of the real coordinate axis into itself 
be defined by 

xTi = di + biX , 

where oi and bi are fixed real numbers with bi ^ 0. Another mapping T 2 of 
this form would be defined by 

xT 2 = a 2 + b 2 x, b 2 t* 0. 

We define the “product” Ti * T 2 of such mappings by 

*di * T 2 ) = (xTdT* 

Prove that 

(i) * is associative, 

(ii) an identity of * exists, 

(iii) each T has an inverse, 

(iv) * is not commutative. 
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Interpret each such mapping geometrically as a change of coordinates on the 
real line, after considering the two special cases 

a arbitrary, b = 1; 
a = 0, 6 arbitrary. 


§ 1 . 6 . Fields 

To illustrate the discussion of § 1.5 let us turn our attention to several 
familiar systems of numbers — rational numbers, real numbers, and complex 
numbers — which occupy a central position in many phases of mathematics. 
In all of these number systems the four fundamental operations of arithmetic 
can be performed, and many algebraic properties are shared in common. Our 
aim is to extract these common features and thereby to construct an abstract 
system which is a useful generalization of each of the three concrete examples. 

We begin by recognizing that the three examples comprise different number 
systems; that is, the set of elements is different in each case. Thus for the 
abstract representation, we consider an unspecified nonvoid set F of elements; 
we may call any such element a “number” if we wish, but we take the position 
that the only properties these objects possess are those which we specify in 
the stated postulates and their logical consequences. Next we turn our atten- 
tion to the arithmetic operation of addition, which in the abstract system 
we denote by the symbol ©. In each example, the sum of two numbers of a 
given type is again a number of that type; hence © denotes a closed operation 
on F. Of the many general properties of addition we select the following as 
postulates. 

Al. © is associative. 

A2. © is commutative. 

A3. There exists in F an identity element o, relative to ©. 

A4. For each element a G F there exists in F an inverse element o_, relative 

to ©. 

In each of the three concrete examples, subtraction can be defined in terms 
of addition, so we make no specific assumptions about subtraction in the 
abstract system; our definition of subtraction would take the form 

a © b = a © b_ 

in imitation of the known examples. 

Multiplication in the general system will be denoted O; it is a closed 
operation on F and possesses properties similar to those we have postulated 
for © , with one notable exception — the number zero does not have a multi- 
plicative inverse. Hence we adopt the following postulates. 
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linear combinations of the void set is defined to be the set consisting of the 
zero vector alone. Thus [4>] = [0] by definition. By adopting these conven- 
tions for the void set, We can obtain a consistent theory of vector spaces in 
which many theorems concerning nonvoid sets of vectors remain valid even 
for the void set. The following is an example. 

Theorem 2.6. Any subset, of an independent set is independent, and 
any set containing a dependent subset is dependent. 

proof: Exercise. 

On many occasions we shall need to construct linearly independent sets 
of vectors. We can begin by choosing any nonzero vector, but how should 
the next choice be made? The following theorem provides an answer. 

Theorem 2.J. Let S C u be linearly independent and let 3 = [£]. Then 
for any vector £, S U {£} is linearly independent if and only if £ £ 3. 

proof : Let S , 3, and £ be as described. First suppose that £ G 3; 
then for suitable scalars a* and vectors a, e S, 

£ = aicn + axon + • • • + 

If a x = 0 for each i, then £ = 0 and S U {£} is linearly dependent. If 
some a* f* 0, again S U {£} is linearly dependent. Conversely, suppose 
that £ £ 3, and consider any equation of the form 

biOLi + + • * • + b m OL m + 6m+l£ = 0, 

where each a* e S. If 6 m +i f* 0, then £ is a linear combination of vectors 
of <S, and so £ e 3, contrary to our assumption. Hence = 0, and 
since S is linearly independent, 6, = 0 for all i < m. Then S U {£} is 
linearly independent. 

Thus a linearly independent set S of vectors may be extended to a larger 
linearly independent set by adjoining any vector which does not lie in the 
space spanned by S; of course it is possible that no such vector exists, since 
S might span V. The next two theorems concern the selection of a linearly 
independent subset of a dependent set of vectors. 

Theorem 2.8. Let S = {cn, . . . , a k } be a finite set of nonzero vectors. 
Then S is dependent if and only if 

a m e [ai t • * • i 

for some m < k. 

proof : If for some m, a m e [a h . . . , then {at, . . . , a*} is 

dependent and so is S (Theorem 2.6). Conversely, suppose S- is de- 



C § 2.4 ] 


Linear Independence 35 


pendent and let m be the least integer such that {a h . . . , a m } is depend- 
ent. Then for suitable scalars c x , . . . , c m , not all zero, 

m 

*- 1 

If Cm = 0, then {<*i, . . . , a m _i} is dependent, contradicting the defini- 
tion of m. Hence 

€t m = Cm (CiQti “V * * * "f* C m _iQ;m-l) ^ [#1* • • * y l] • 

Theorem 2.9. If 3 ^ [0] is spanned by the set S = {«i, there 

exists a linearly independent subset of S which also spans 3. 

first proof: If S is independent, there is nothing to prove. 
Otherwise, by Theorem 2.7 there is a least integer i such that a, f= 
[ai, . . . , a,_i]. Let = S — a,. Clearly 3 = [Si], and the argument 
can be repeated on Si. Either Si is independent, in which case the proof 
is complete, or for some j , S 2 = Si — a } spans 3. The theorem follows 
by repeating the argument a finite number of times. 

second proof: Let a Tl ^ 0 and let 3i = [a r J. Then 3i C 3, and 
equality holds if a, e 3i for i = 1, . . . , k, in which case the proof is 
complete. Otherwise, by Theorem 2.6, {a fl , a r J is linearly independent 
for some a v Let 3 2 = [a fl , ol t J. Clearly 3 2 Q 3, and the argument can be 
repeated to construct a linearly independent subset of S which spans 3. 

In each of these proofs we constructed an independent subset S' of S such 
that [S'] = [S]. Thus for every £ e 3, {S', £} is dependent, and if S" is 
any set of vectors of 3 which contains S' as a subset, either S" = S' or S" is 
dependent. An independent sd^, which has the property that it cannot be 
extended in 3 to a larger independent set is called a maximal independent 
subset of 3. This concept is used in the next section. 

Corollary. Any finite set of nonzero vectors contains a maximal inde- 
pendent subset. 

This corollary may be strengthened by dropping the word “finite/' but 
we have no need here for the stronger result, and shall pursue the idea no 
further. 


Exercises 

1. (i) Referring to the vectors of Exercise 4, § 2.3, select an independent 

subset of {a, 0, y, 5} , containing three vectors. 
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(ii) Prove that the vectors of your answer to (i) form a maximal 
independent subset of 13. 

(iii) Still referring to Exercise 4, § 2.3, choose any nonzero vector 

fi G S D 3. Find vectors and fa such that S = [f,, fe] and 3 ■» [fi, f»]. 

Prove that [fi, f 2 , f 2 ] = S + 3. 

2. Select a maximal linearly independent subset of each of the following 
sets of vectors. 

(i) (1, 0, 1, 0), (0, 1, 0, 1), (1, 1, 1, 1), (-1, 0, 2, 0). 

(ii) (0, 1, 2, 3), (3, 0, 1, 2), (2, 3, 0, 1), (1, 2, 3, 0). 

(iii) Cl, -1,1, -1), (-1,1, -1, -1), (1, -1,1, -2), (0,0, 0,1). 

3. For each example of the preceding exercise in which the largest inde- 
pendent subset contains fewer than four vectors, adjoin vectors (oi, Oj, o 3 , a t ) 
to obtain a linearly independent set of four vectors. 

4. Prove Theorem 2.5. 

5. In the space of real n-tuples prove that the vectors 

«, = ( 1 , 1 , 1 ,..., 1 , 1 ), 

« 2 = (0, 1 , 1 , ... , 1 , 1), 
as = (0, 0, 1, ... , 1, 1), 


( 0 , 0 , 0 , . . . , 0 , 1 ), 

are linearly independent. 


§ 2 . 5 . Basis 

At the beginning of the chapter, when we first discussed the physical con- 
cept of a vector in three-dimensional space, our description was made in 
terms of that ordered triple of numbers which specified the coordinates of 
the end point of the “arrow,” relative to a rectangular coordinate system. 
We shall now examine this idea more carefully. First we observe that 

(a if a 2 , a 3 ) = Oi(l, 0, 0) + a 2 (0, 1, 0) + a 8 (0, 0, 1} 

= 01*1 + 02*2 + 03 * 3 , 

where c„ i = 1, 2, 3, represents the triple whose ith component is 1 and whose 
other components are zero. Clearly, {«i, €2, €3} spans the space and is linearly 
independent. Therefore, this set is a maximal linearly independent subset of 
the space. These three vectors are unit vectors along three coordinate axes, 
and every point in the space acquires a unique system of coordinates relative 
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to these vectors, the coordinates being the three scalars used to represent a 
given vector as a linear combination of the c t . 

From the proofs of Theorem 2.9 it is clear that there is nothing unique about 
the way we might choose a maximal independent subset of a space. Many 
such subsets exist for three-dimensional space, and indeed for any space 
T> t* [0]. For example, a second maximal linearly independent set in three- 
dimensional space is 

ft = (0, 1, 1), 
ft = (1, 0, 1), 
ft - ( 1 , 1 , 0 ), 

and if we let s = J(oi + a 2 + a 3 ) we have the linear representation 
(fli, 02 , 0 3 ) = (s - ai)ft + (s - a 2 )ft + (s - a 3 )ft. 

The scalars of this representation are uniquely determined, but they are dif- 
ferent from the scalars which represent the same point relative to the e 4 vec- 
tors. We now turn to a general consideration of these observations. 

Definition 2.9. A maximal linearly independent subset of a vector 
space V is called a basis of V. If TJ contains a finite basis, V is said to be 
finite-dimensional; otherwise V is infinite-dimensional. 

In this book we shall devote our attention to finite-dimensional spaces 
except for occasional comments and examples to illustrate similarities or 
differences between the two types of spaces. 

Since a basis for V spans V f every vector £ e V is a linear combination of 
basis vectors, 

£ = C\ot\ + * • • + Ckak. 

The scalars in this representation are unique; for suppose 

£ = kiai + • • • + bka k . 

Then 0 = £ — £ = (ci — b i)ai +•■■ + (<?* — &*)<**, and since the a x are 
linearly independent, c t — 6, = 0 for every i = 1, . . . , k. We have proved 
the following theorem. 

Theorem 2.10. If *0 [0], every vector of V has a unique representa- 

tion as a linear combination of the vectors of a fixed basis of V . 

This theorem reveals a significant interpretation of bases. Given a basis 
{ai, . . . , a n } , each vector £ can be represented in one and only one way as a 
linear combination of basis vectors: 

£ = Ciai + • • • + c„a n . 
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Thus £ determines uniquely an n-tuple of scalars, (ci, . . . , c n ), which can be 
regarded as the coordinates of £ relative to the a-basis. Hence each basis of 
U determines a system of coordinates for *0. This suggests that any 
n-dimensional vector space is not unlike the space of all n-tuples of scalars, 
a fact which we later prove. We shall make repeated use of this interpreta- 
tion of a basis as a coordinate system for V . 


Examples of Bases 

(a) Consider the space of ordered pairs of real numbers, represented geo- 
metrically by the cartesian plane. The unit vectors (vectors of unit length 
along the chosen x- and ?/-axes) are = (1, 0) and e 2 = (0, 1), respectively. 
These two vectors form a basis since they are independent, and Or, y) = 
xt\ + 2/e 2 , so that and e 2 span the space. But also, <*i = (a, 0) and 
«2 = (0, b) form a basis for any a and b different from zero. Furthermore, 
0i = (1, 1) and 0 2 = ( — 2, 1) form a basis. As shown in Figure 2.5, any pair 
of vectors which do not lie on the same line forms a basis for the plane. 

(b) More generally for the space of all real n-tuples, let e t be the n-tuple 

whose it h component is 1 , the other components being zero, i = 1 , 2, . . . , n. 
The are linearly independent, and (a h . . . , a n ) = + ■ • • + a„e n , so the 

e, form a basis. Throughout this 



book we shall reserve the symbols 
c t to represent the vectors of this 
particular basis, and reserve the 
symbol 8 n to represent, the space 
of real n-tuples with this choice of 
basis. 

(c) Consider the space of all real 
polynomials of degree not exceeding 
a fixed natural number n. Then 


Figure 2,5 x° = 1, x 1 , . . . , x n form a basis. 

(d) An example of an infinite- 
dimensional space is the space of all real polynomials. Each polynomial 
is of finite degree but we include all finite degrees. The polynomials x k f 
ft = 0, 1, 2, , form a basis. 


We have seen that any vector space has many bases; however, every basis 
has the important property stated in the following theorem. 


Theorem 2,11. Every basis for a finite-dimensional vector space *1) has 
the same number of elements. 

proof: Let A = J \a ly and B = {0 lf . . . , 0 W } be bases for V. 

Each set is a maximal independent set, so B x = {ai, 0i, 0 2 , . . . , (SJj is 
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dependent. By Theorem 2.8 some is a linear combination of the 
vectors which precede it, and there exists a subset B[ of B\ which con- 
tains ai as the first vector and which is a basis for V. Then = {« 2 , B\) 
is dependent, and some vector is a linear combination of the ones which 
precede it. This vector cannot be a\ or a> since A is linearly independ- 
ent. Hence there exists a subset IV* of IU which contains a> and a\ as 
the first and second vectors and which is a basis for V. Lot Bn = fa llf B' 2 \ 
and repeat the argument. If all the (3 t are removed in this way before 
k steps, we obtain the basis B } = [a }> . . . , co! for j < k, which 

contradicts the independence of .4, since a k C [ B , ]. Hence k steps are 
required to remove all of the {3 ly one or more at a time, so k < m. Revers- 
ing the roles of A and B in the replacement process, we obtain m < k, 
so the proof is complete. 

Exercises 

1. Verify that the several sets described in Kxamples (a) to (d) of this 
section are bases. 

2. Let {«i, . . . , a n ) be a basis for V, and let c, be arbitrary nonzero 
scalars, i = ], 2, . . . , n. Prove that {cjai, Cza?, . . . , r„a„) is a basis for r 0. 
Interpret geometrically for n = 3. 

3. Show that any three points which do not lie in a plane through the 
origin determine a basis for three-dimensional space. 

4. Beginning with aj = ( — 1,1, 2), construct two bases for the space of 
all real triples in such a way that if [ai, a*, a A ) and fai, a' 2) a*] are the two 
bases, then {a 2 , cr 3 , a 3 ] is also a basis. 

5. (liven the basis = (1, 1, 1, 1), a 2 = (0, 1, 1, 1), a 3 = (0, 0, 1, 1), 
a 4 = (0, 0, 0, 1), express each vector e a , i — 1,2, 3, 4, as a linear combination 
of the or’s. Likewise express each a x as a linear combination of the e’s. 

6. Let {on, . . . , a n \ and {(3\, . . . , (3 n } be two bases for the space of all 
real n-tuples. Define the mapping T of the space into itself by the statement, 

if £ = L c t a xy then £T = 2Z c.0»* 

t — i * — i 

Verify that 

(i) T maps a, onto (3 t , i — 1, . . . , n, 

(ii) T is a one-to-one mapping, 

(hi) (£ + ij)T = £T + rjT, 

(iv) (*£)T = *(£T). 

7. Let t> be a finite-dimensional vector space. 

(i) Prove that if {co, . . . , a n } is a basis for *13, if S = fan • 
and if 3 = fa* + i, . . . , a„], then V = S © 3. 
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(ii) Prove, conversely, that if S and 3 are any subspaces of V such 
that 1) = S © 3, if {«], . . . , or*} is a basis for S, and if {ft, . . . , 0 m } is a 
basis for 3, then {a h . . . , a*, A, . . . y /3 m } is a basis for U. 


§2.6. Dimension 

Now that the number of elements in any basis for *0 has been shown to be 
unique, we use this number as a definition of the dimension of a vector space. 

Definition 2.10, The dimension of a finite-dimensional vector space is 
the number of vectors in any basis. The dimension of V is denoted d(*U). 

By definition, a set of vectors is a basis of 1) if and only if two conditions 
are satisfied : 

1. the set must be linearly independent, and 

2. the set must span D. 

However, for an ?i-dimensional space and a set of n vectors, these two condi- 
tions turn out to be equivalent. This result (Theorem 2.13) simplifies the 
task of verifying that a given set is a basis. We first prove another useful 
theorem. 

Theorem 2.12. Any linearly independent set of vectors in an n-dimen- 
sional space V can be extended to a basis. 

proof : Let {co, . . . , a?*} be linearly independent, and let {ft, . . . , ft*} 
be a basis. Let 3* = [a u . . . , a*]. If ft G 3* for i = 1, . . . , n, then 
3* = V. Otherwise, for some j, ft C 3 a, so by Theorem 2.7 {on, . . . , a*, ft } 
is independent. Thus the original set has been extended to a larger inde- 
pendent set, and the theorem follows by repeating the argument until 
the enlarged set spans V. 

Theorem 2.13. Let A = {a iy . . . , «„} be an arbitrary set of n vectors 
of an n-dimensional space V. 

(a) A is a basis for V if and only if A is linearly independent. 

(b) A is a basis for V if and only if [A] = V. 

p r o o f : If A is linearly independent, A may be extended to a basis by 
Theorem 2.12. But a basis contains only n vectors, so A is a basis. To 
prove the second statement, suppose 1) = [A]. By Theorem 2.9 a linearly 
independent subset of A also spans V and hence is a basis. But any basis 
contains n vectors, so A itself must be that subset. The “only if” state- 
ments of (a) and (b) are valid by the definition of a basis. 
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If S and 3 are subspaces of *0, what can be said about the dimensions of 
the spaces S + 3 and S fl 3 (§ 2.3)? A partial answer is available. 

Theorem 2.14. d(S + 3) + d(S fl 3) = d(S) + d( 3). 
proof : Of the four subspaces involved in this theorem, S fl 3 is a 
subspace of each of the others, and both S and 3 are subspaces of $ + 3. 
Our proof begins with the choice of a basis fori, . . . , for S fl 3, where 
k = d(S fl 3). Then d(S) = k + ? and rf(3) *= A’ + j for some non- 
negative i and j. The basis for 8 n 3 can be extended to a basis 
(ai, . . . , a k} ftt+i, .... Pk- fi} for S. A different extension similarly pro- 
duces a basis (on, . . . , 7* + i, . . . , y k +j\ for 3. Combining these two 

bases gives a set {a h . . . , a L , ft* i, . . . , A* lf 74 + 1 , . • . , of k + i+j 
vectors. The theorem follows immediately when it is proved that this 
set is a basis for S + 3, which is left as an exercise. (Remember that 
since the dimension of S + 3 is not yet known to be k + i + j, it must 
be shown that this set is linearly independent and spans S 4- 3.) 

It should be noticed that Theorem 2.14 is similar in form to Exercise 7 of 
§ 1.2, concerning th'e number of elements in the union and intersection of 
finite sets. In geometric terms, Theorem 2.14 proves, for example, that in 
three-dimensional space any two distinct planes through the origin intersect 
in a line through the origin. 

Exercises 

1. Prove Theorem 2.14 in detail. 

2. Determine the dimension of each of the five spaces given as examples 
in § 2.2. 

3. Show that if subspaces (ft and S have the same dimension and if (ft C s, 
then (ft = S. 

4. Let S be a fc-dimensional subspace of the n-dimensional space V,k > 1. 

(i) Show that an (n — fc)-dimensional subspace 3 exists such that 
S n 3 = [$]. 

(ii) Deduce that §©3 = 1). 

(iii) Show that 3 is not uniquely determined by S and 

5. (i) Referring to Exercise 5, § 2.3, if necessary, prove that for sub- 
spaces (ft, S, 3 of V d((R + B + 3) < d(<R) + d(S) + d( 3) - d(<R fl S) — 
d(<ft (13)- d(8 D 3) + d(<ft D S (1 3). 

(ii) Compare (i) with the corresponding formula for the number of 
elements in finite subsets (Exercise 7(ii), § 1.2). 

(iii) Show that the result of (i) cannot be strengthened to equality 
in all cases (Exercise 5, § 2.3). 
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§ 2 . 7 . Isomorphism of Vector Spaces 

We now turn our attention to a problem which has been constantly in the 
background of our development of vector spaces. We began by making an 
informal description of the spaces S 2 and S 3) which were familiar from our 
knowledge of analytic geometry, and then generalized the form of our observa- 
tions in order to define an abstract vector space. Abstract vector spaces are 
of two types “finite-dimensional or infinite-dimensional — and we agreed that 
we shall study only finite-dimensional spaces in this book. But in considering 
bases we began to suspect that any n-dimensional space over a field is essen- 
tially the same as the space of n-tuples of field elements. 

In order to formulate our suspicions more precisely, we must first agree 
upon a meaning of the phrase “essentially the same.” This is a problem of 
importance for all abstract systems, but we confine our attention here to 
vector spaces; a more general discussion is given in Appendix A. 

Consider two vector spaces, 

V = {F, F) +, •, ©, ©} 

and 

W = {W,Fi+, •,+, •}, 

over the same field £F. The vectors of the two systems might have different 
mathematical names, as suggested by the various examples of vector spaces 
given in § 2.2, and the vector operations of the two systems might be defined 
in different ways. However, suppose we can rename the vectors of the first 
system, assigning to each vector of the first system the name of a vector of 
the second system, with different names being given to distinct vectors. 
Suppose further that the new names are assigned in such a way that the 
vector operations of the renamed first system coincide exactly with the 
corresponding operations of the second system. Then we would agree that 
the two systems are identical twins which are distinguishable by name only, 
and not by behavior. This is what we mean by “essentially the same”; the 
mathematical term for this concept is isomorphism. YVe first define the more 
general notion of homomorphism. 

Definition 2,11. Let 

■0 = {F,F; 0} 

and 

W = {W,F;+, •,+, •} 

be vector spaces over a field ff. A mapping H of V into W is called 

a homomorphism , provided that for all a, 0 G V and all aeJ, 
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and 


(a © /9)H - all + #11, 
(a O a)H = a • all. 


If every vector of W is in the range of II, II is said to he a homomorphism 
of V onto 


Definition 2.12. A onc-to-onc homomorphism J of 1) onto is called 
an isomorphism . If such a mapping exists, *0 and W are said to be 
isomorphic. 

Thus, to establish that two vector spaces V and VV over the same field 5 
are isomorphic, wc need to exhibit a one-to-one mapping of D onto which 
preserves the two operations of vector sum and mult iplication of a vector by a 
scalar. We are now ready to prove the following theorem. 

Theorem 2.15. Any n-dimensional vector space V over 5 is isomorphic 
to the space 3 n of all n-tuples of elements of ff. 

proof : Let {ai, . . . , a „ } be a basis for V. By Theorem 2.10 every 
£ G V has a unique representation as a linear combination of the a,-: 

£ = C\a\ + ■ • • + c n a n . 

To each £ e V we associate the corresponding n-tuplc (cj, . . . , c n ) € $ n . 
This is a mapping of V onto t? n , and distinct vectors of V map into 
distinct vectors of $ n . Furthermore, let £ = i c,a, and V = 2Z?-i b % a x . 
Then 

£ + v — H ( c t + b l )a l — y- (Cl + b \, . . . , c n + h n ) 

= (c i, . . . , c n ) + (6i, . . . , b n ) 
and 

n 

k% = £ (. kc,)a , — y (kc h . . . , kc n ) = k{c h . . . , c„). 

1-1 

Hence vector sum and scalar multiplication are preserved by the map- 
ping, and the systems are isomorphic. 

This result tells us that any two vector spaces of the same finite dimension n 
are isomorphic, since each is isomorphic to 5 n . (See Exercise 4, below.) It 
allows us to think of any such abstract space in terms of the more familiar 
space of n-tuples. 

The isomorphism theorem suggests that for finite-dimensional spaces our 
attempt to obtain generality by giving an abstract definition of vector spaces 
was not wholly successful. Any n-dimensional space over J is isomorphic to 
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the space of n-tuples of elements of !F. However, the n-tuple notation is often 
unnecessarily cumbersome, so we prefer to use the general notation for vec- 
tors, remembering that we can represent vectors as n-tuples of field elements, 
without loss of generality, whenever that particular representation proves to 
be convenient. We shall denote by V n a vector space of dimension n with 
any basis. 

Exercises 

1. Establish an isomorphism between the space of all polynomials of 
degree not exceeding n and the space ^ n+ i. What are the images of the basis 
vectors under the isomorphism? 

2. Let J be an isomorphism of the spaces and \V n . Let {<*i, . . . , at} be 
a linearly independent set of vectors in V n , k < n. Prove that {aiJ, . . . , a*J} 
is linearly independent in W w . Deduce that the isomorphic image of a basis 
is a basis. 

3. Let H be a homomorphism of V n into W„. Prove that H is an iso- 
morphism if and only if, for every £ 6 in *U n , £H 5 * 6 in W n . 

4. Suppose that vector spaces V and W are both isomorphic to the same 
space C U. Show how these two isomorphisms can be combined to yield an 
isomorphism of V onto *W. 

5. Let V and be vector spaces over the same field JF, and let X denote 
the collection of all homomorphisms from V into W. The sum of two homo- 
morphisms is defined by 

£(Hi "h H 2 ) = £Hi -j- £H 2 for all £ £ *U. 

Also, the product of a scalar and a homomorphism is defined by 
£(aH) = a(£ll) for all a e ST, £ e D. 

Show that Hi + H 2 and aH are also homomorphisms from V into W and 
that X forms a vector space over relative to these operations. 

6. Referring to Exercise 5, if W =? *U, then it is possible to define the 
product of two homomorphisms in X in terms of the rule for successive 
mappings: 

{(HjH,) = (£H0H 2 for all £ e V. 

Show that HjH 2 is a homomorphism from V into V. 

7. If we specialize Exercise 5 by choosing W to be the field JF, considered 
as a vector space over itself, then the members of X are scalar valued functions 
defined on V and satisfying the properties of a homomorphism. Any such 
function is called a linear functional , and the vector space X of all linear 
functionals from V to JF is called the dual space of D. Determine whether or 
not each of the following mappings is a linear functional. 
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(i) In the mappings 

(oi, . a n ) H = 2a,, 

K • • • , On)H = fll + 2. 

(ii) In the space of all real-valued functions which are differentiable 
on the interval -1 < x < 1, the mappings 

/ n-m 

fll =/( 0)/'(0). 

(iii) In the space of all real-valued functions which arc continuous on 
the interval 0 < x < 1, the mappings 

/II - fmr'A 
/ii - j'nirn 
/II = r 

8. Associated with any subspace S of U there is a family of homomorphisms 
from V to S called projections. Each such projection Ps is determined by the 
choice of a subspace 3 such that U = S © 3 (Kxereise 1, §2.(1). Since each 
£ e U has a unique representation £ = a + r, where a e S and r e 3, the 
mapping Ps is defined by 

IPs = a 

and is called the projection of V onto S along 3. Similarly, the mapping Pj, 

IPs = r, 

is called the projection of V onto 3 along S. Prove that 

(i) Ps and Pa are homomorphisms of 1) out o S and onto 3, respectively. 

(ii) Ps and P 3 are idempotent relative to the operation of successive 
mapping operations (Exercise 0). 

(iii) PsP.i = Z = P 3 Ps, where £Z = 0 for every f el), 

(iv) Ps -f P 3 = I, where £1 = £ for every £ e U 
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§ 3 . 1 . Homomorphisms of Vector Spaces 

A study of vector spaces can be extended beyond the preliminary investi- 
gations of the preceding chapter, and we shall develop further theory as it is 
needed. For the present, however, we turn our attention from the spaces 
themselves to the subject of homomorphisms of vector spaces. In this chapter 
and the next we shall see that such homomorphisms are intimately related 
to matrices. 

We assume that V and %V are vector spaces over the same field 5. From 
Definition 2.11, a homomorphism of V into W is a mapping H which preserves 
the two operations involving vectors; that is, for all a, 0 e T), a e JF, the 
following equations hold in W: 

1. (a -f- /3)H = aH + /3II, 

2. (ao)H = a(all). 

We now show that equations 1 and 2 can be replaced by the single condition 

3. (aa + 6/3) H = o(oH) + 6(/?H). 

Clearly, equation 1 follows from equation 3 by selecting a = 1 = 6, and 
equation 2 follows by choosing b = 0. Conversely, 

(oo + 6/3) H - (act)H + (6/3)H 

by equation 1, which then reduces to a(aH) + 6(/SH) by equation 2. 

Condition 3 is the requirement of linearity, and since a homomorphism is a 
mapping, it is reasonable to use geometric language to describe mappings. 
Henceforth we shall call such a homomorphism a linear transformation. The 
terms linear mapping and linear operator are also used as synonyms. 
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Definition 3.1. A linear transformation T from a vector space to a 
vector space \V, both over the scalar field ;7, is a mapping of V into W 
such that for all a, (3 G V and for all a } b G 

{aa + W)T = a(otT) + HffT). 

This definition is stated more elegantly in the form “a linear transformation 
from *U to W is a homomorphism of V into We remark that either form 
of the definition includes the possibility that W and V are the same space. 

Before proceeding we call attention to our choice of notation for linear 
transformations, which will have important consequences for the notation 
which wo later adopt for matrices. Since a linear transformation T is a 
function from TJ to \V we could use standard functional notation in which 
T(a) represents the “value” of the function T at the vector a. But, as we 
observed in § 1.4, if we wish to emphasize the geometric character of T as a 
mapping, and particularly if successive mapping (function of a function) is 
considered often, the notation a'Y is a useful substitute for Tfo). In this 
respect the notation for linear transformations is not standardized, and you 
are advised when consulting other books to ascertain whether the author uses 
T(«) (left-hand notation) or aT (right-hand notation) for linear transforma- 
tions and matrices. Bight-hand notation is used in this book. However, to 
facilitate the translation of major results from one notational system to the 
other, a summary of results in both systems is given in Appendix B. 

We shall regard linear transformations as the elements of an abstract system 
whose nature is to be investigated. First we need to decide upon the relations 
and operations of the system. Since linear transformations are functions, we 
accept the notion of equality of functions as a definition of equality of linear 
transformations. 

Definition 3.2. Two linear transformations and T 2 from V to *W 
are said to be equal if and only if = aT 2 for all a ED, 

This means that equal transformations determine the same mapping of the 
vectors of V into vectors of *W, and that a linear transformation is determined 
by its effect on the vectors of V. We use this method of description to define 
operations on linear transformations. 

Definition 3.3. The sum Ti © T 2 and scalar multiple c Q T x of linear 
transformations from D to W are defined, respectively, by 

(a) a(Tj © T s ) = aTi + aT 2 , all a G V, 

(b) a(c G Ti) = c(aTj), all a e *0, c € 3. 
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ThuR, given a field fT and two vector spaces *0 and over ST, the set L of all 
linear transformations forms a system £ for which two operations are defined. 

Sumi Tj © T 2 , all Tj, T 2 G L. 

Scalar multiple: c O Ti, all c G ST, Ti G L. 

Theorem 3.1. Let ST be a field, *0 and vector spaces over ST, and L the 
set of all linear transformations from V to \V. The system 

£ = {L,F;+, ©, O} 

is a vector space over tf. 

proof: Exercise. Except for minor changes of terminology, this is 
Exercise 5, § 2.7. 

Most of our attention in the rest of this book will be devoted to the study 
of properties of the space £ and its elements. Generally we shall consider 
linear mappings from one vector space V to another space *W. But frequently 
we shall specialize this general study by choosing W in one of two ways: 
either 'W = *0 or W = ST, where we regard the field SF as a vector space over 
itself. These special cases are investigated in § 3.0 and § 3.4, respectively. 

Definition 3.4. Let V, and be vector spaces over ST. let T! be a 
linear transformation from V to \V, and let T 2 be a linear transformation 
from to < y. Then the product transformation Ti □ T 2 is the mapping 
from 1) to <y defined by 

a(Ti □ T 2 ) = (aTi)To for every a G V. 

It is easily verified that Ti □ T 2 is linear. We note in particular that if 
we restrict our attention to the set L of linear transformations from *0 into 
T) itself, then in addition to sum © and scalar multiple G, a third opera- 
tion □ is defined for the system £. Since □ is the successive mapping opera- 
tion, it is associative. We shall return to this point in § 3.0. 

Several special linear transformations merit our attention. The zero linear 
transformation Z is defined from V to *W by 

aZ = 0 for every a G V. 

Corresponding to each linear transformation T from V to W, there is the 
negative linear transformation, denoted — T and defined by 

a( — T) = — aT for every a E V. 

For each space *0 there is an identity linear transformation I from T) onto V, 
defined by 

al = a for every a G 13. 

Of course, if T is a linear transformation from V to W, then both I □ T and 
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T □ I are defined, but in the former I represents the identity mapping on V 
and in the latter I represents the identity mapping on 'W. You should verify 
that these names are justified; that is, that 

T © Z = T for every T, 

T © — T = Z for every T, 

T □ I = I □ T = T for every T. 

Once again we simplify our notation by using customary symbols for the 
operations on linear transformations. 

Examples of Linear Transformations 

(a) The space P. 2 of pairs of real numbers is represented geometrically by 
the plane. The transformation defined by (.r, y ) T = ( kx , ky) for fixed scalar k 
is a linear transformation which maps each point P into the point Q which 
is collinear with P and the origin and k times as far from the origin as P is. 



(b) Again in £ 2 , the transformation defined by 

(x, y ) T = ( x cos ^ — y sin 'l', x sin ^ + y cos 

is a linear transformation which rotates each point of the plane about the 
origin and through the angle 



Figure 3.2 
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(c) In C 2 let Ti, T 2 , and T 3 be defined by 

Or, y)Tj = ( x , 0), 
(x, y ) Tj = (0, y), 
0, y) T 3 = (y, x). 


(^*y) T 2 


/ 


(xc.y)r 5 s 

\/ 

/ \ 

/— — -] (^y) 


/ 


/ 


/ 


( jr »y)' r i 


Figure 3.3 


All of these transformations are linear. Ti is a projection of each point of the 
plane onto the x-axis; T> is a projection onto the y- axis; T 3 is a reflection 
across the line y = x . Observe that ( x , y)T{T 2 = (x, 0)T 2 = (0, 0), so TiT 2 = Z 
but Ti 9^ Z and T 2 5^ Z. Hence a product of nonzero transformations can 
be the zero transformation. Also, ( x, y) T 2 T 3 = (0, ?/)T 3 = (y f 0) ; however, 
(x, y) T 3 T 2 = (y, x)T 2 = (0, x). Hence T 2 T 3 ^ T 3 T 2 , so the multiplication 
of transformations is not commutative. Finally, observe that ( x } y)TiTi = 
(x, 0)Ti = (x, 0) = (x, y)Ti, so that T? = T L . Thus there exist idempotent 
transformations other than I and Z. 

(d) In the space of polynomials P of degree not exceeding n, let 

PMD -£pu. 

Familiar properties of the derivative show that D is linear. Observe also 
that / J (x)D fl+1 = 0 for every polynomial in the space, so D n+1 = Z. Thus 
there exist Jionzcro transformations T such that a finite power of T is Z. A 
transformation T is called nilpotent of index k if T* = Z but T* -1 5 ^ Z. 


Exercises 

1. Prove that the sum and scalar multiple of linear transformations are 
linear. 

2. Show that the transformations Z, I, and — T are entitled to the names 
zero, identity, and negative. 
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3. Verify that each transformation listed in Examples (a) to (d) is linear. 

4. Prove that 0T = 6 for any linear transformation T. 

5. If we regard the complex numbers as a vector space over the real field, 
is the conjugate mapping, (a + ib)T = a — ib, a linear transformation? 

G. Let T be a linear transformation of *1) into W. 

(i) Show that any subspace of 13 is mapped by T into a subspacc’of *W. 

(ii) Conversely, show that if ‘y is a subspace of \V, the set of all vectors 
mapped into if is a subspace of *0. 

7. (i) Show that a linear transformation T from 13 n to \V is determined 
by the effect of T on any basis of 'On. 

(ii) Conversely, let {«j, . . . , a„} be a basis of 13„ and let {&, . . . , ff n } 
be any set of n vectors in IV. Show how the correspondence a, — *■ can 
be used to define a linear transformation from all of 1)„ into W. 

8. Which of the following transformations on £ 3 arc linear? Describe the 
geometric effect of each. 

(i) (ai, a 2 , a 3 ) T = (co + 1, a 2 + 1, 0), 

(ii) (ai, a 2 , a 3 ) T = (a 2 , a u a 3 ), 

(iii) (ai, a 2 , a 3 )T = (a,, a 2 , 1), 

(iv) (a,, a 2 , a 3 )T = (a h ~a 2 , -a 3 ). 

9. Let 13, IV, 9C, y be vector spaces over and let R, S, T be linear map- 
pings, respectively, of 13 into IV, IV into X, and X into % 

(i) Prove that R □ S is a linear mapping of 13 into 9C. 

(ii) Prove that (R □ S) □ T = R □ (S □ T). 

10. In the space of all polynomials P of all degrees define mappings M and 
D by 

P(x)D = £p(jc), 

P(x) M = xP{x). 

(i) Prove that both D and M are linear transformations. 

(ii) Is D nilpotent on this space? Compare with D in Example (d). 

(iii) Prove that MI) — DM = I. 

(iv) Deduce that (DM) 2 = D 2 M 2 + DM. 


§3.2. Bank and Nullity of a Linear Transformation 

We recall that a linear transformation T is, by definition, a homomorphism 
from a space 1) into a space IV, where 13 and *W may be the same space, but 
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arc not necessarily so. The domain of T is the space *1), and the range of T 
is a subset (Rx of IV, the set of all images aT of the vectors of 13: 

(Rx = {fi e IV I 0 = aT for some a E 13}. 

It is easily proved that (Rx is actually a subspace of IV, for if 0, 7 E (Rx, then 
(3 = aiT and 7 = a 2 T. Hence 0 + 7 = cnT ~b a 2 T = (ai + a^)T e (Rx. 
Similarly for c E fF, c/3 = c(ajT) = (cai)T E (Sir. By Theorem 2.2, (Rx is a 
subspace of IV. 

Another important set associated with any vector space homomorphism T 
is the kernel Six of the homomorphism, which is defined to be the set of all 
vectors in 13 which are mapped into 0: 

Six = {a E 13 | aT = 0] . 

To sec that Tlx is a subspace of 13, let a, 0 E Tlx, c E 7. Then ( a + 0)T = 
aT + /3T = 0 + 0 = 0, so a + f) e 71 t ; also (ca)T = c(«T) = cd * 0, so 
m E Tlx. Thus Tlx is a subspace of 13. 

These two subspaces, (Rx and Tlx, called, respectively, the range space of T 
and the null space of T, are of major importance in the study of linear algebra, 
as are their dimensions. 

Definition 3.5. 

(a) The range space (Rx of a linear transformation T is the set of all 
images aT E \V as a ranges over 13. 

(b) The rank p(T) of a linear transformation T is the dimension of its 
range space. 

Definition 3.6. 

(a) The null space Tlx of a linear transformation T is the set of all 
vectors a E 13 for which aT = 0 E IV. 

(b) The nullity v(T) of a linear transformation T is the dimension of its 
null space. 

Unworn 3.2. If T is a linear transformation from 13 to IV and if S is 
a linear transformation from \V to % then 

(a) (Rxs Q (Rs and p(TS) < p(S), 

’ (b) Tlxs 2 Tlx and v(TS) > p(T). 

proof: Exercise. 

Thus we see that each linear transformation T from 13 to IV automatically 
selects a subspace Tlx of 13 and a subspace cRx of IV. These two subspaces 
are related in an interesting manner, as indicated by the following theorem 
and its corollaries. 
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Theorem 3.3. Let {<*i, . . . , a„<r)} be a basis for SRt. Extend this 
basis to any basis {a h . . . , a„(T), «>(T)+i, . . . , a n } for D w . Then 
{a„(T)+iT, . . . , a n T} is a basis for <Rt. 


proof: Let {an, . . . , a n } be chosen as in the statement of the theorem. 
Any vector of (Rt is of the form £T for some £ e D„. Let £ * £?-i a,o,; 
then 



i-i 


a.(a ( T) 


n 

- E 

• «*(T) + 1 


a,(a,T), 


since a t T = 0for i = 1 , 2, ... , v(T). Hence . . . , a n T} spans 

(Rt. Since we do not know the dimension of (Rt wc must also prove 
linear independence. Suppose scalars b tt not all zero, exist such that 


0= ± b j (a *T) 

v(T)+l 



T. 


Then ]C?(T) + i&ta» € 91t; but {an, . . . , a^T)] spans JJIt, so for suitable 
scalars c ty 

n *<T) 

S b t a t = 51 

*(T)+i i 


This contradicts the linear independence of {«i, so the vectors 

{a,rr)+iT f . . . , a n T} are linearly independent and therefore form a 
basis for (Rt. 


Theorem 3.4. If T is a linear transformation from D„ to *W, then 
p(T) + v{T) = n. 

proof : Exercise. 


Now if we consider *W and V n to be the same space, then T, T 2 , T 3 , . . . are 
all well-defined transformations of into itself. The corresponding range 
and null spaces form chains, as indicated in the following result. 

Theorem 3.5. If T is a linear transformation on V n , then 

(a) *U n 2 (Rt 2 (Rt* 2 • * * 2 (Rt* 2 • * * f 

(b) [$] 2 31 t 2 91t* 2 * ■ ■ 2 31t* 2 * * • • 

Furthermore, if p is a positive integer such that (Rt* = (Rt^*, then for 
every integer k > 1 we have (Rtp = (Rt* 4 * and 91 t* = 9Ht* + *. 

proof : The chains are established by repeated application of The- 
orem 3.2. Furthermore, in a finite-dimcasional space equality must hold 
somewhere in each chain. The dimension relation of Theorem 3.4 shows 
that (Rt* = <Rt* + * if and only if 91 t* = 91t* + *. Now assume 91 t* * 91t**», 
and let £ e Dlx***. Then 
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e = - (£T*~ 1 )T |H ‘ 1 . 

Hence {T*~ l e 31 tp + * = 31tv If k > 1, the argument may be repeated 
to give £T* -2 £ 3 ^tp, etc., so finally £ e DIt*. Thus £ 91 t*, and 
the chain relation gives OZt* £ so equality holds. 

This result gives us a fairly clear picture of the range and null spaces of 
an iterated transformation. As T is iterated on 15 n , the corresponding null 
spaces form a strictly increasing sequence of subspaces up to a certain number 
p < n of iterations, at which point the increase stops. Thereafter, further 
application of T maps into 6 only those vectors which are mapped into 6 by 
T 7 '. Likewise, the range spaces form a strictly decreasing sequence up to p 
iterations, and further application of T maps (Rt* onto itself. 

Exercises 

1. Prove Theorem 3.2. 

2. Prove Theorem 3.4. 

3. Let S and T be linear transformations from V n to U n ; prove the follow- 
ing relations for rank and nullity. 

(i) p(T + S) < p(T) + P (S). 

(ii) „(T + S) > p(T) + K S) - n. 

(iii) *(T) + *(S) > KTS) > max {„( T), p(S)}. 

(iv) P (T) + P (S) - n < p(TS) < min [p( T), P (S)}. 

4. Specify the range space and null space of each of the linear transforma- 
tions given in the examples of § 3.1. 

5. Illustrate the statements of Exercise 3 above by using the linear 
transformations T 2 and T 3 of Example (c), § 3.1. 

6. Demonstrate by specific examples that the second inequality of Exer- 
cise 3(iii) and the first inequality of Exercise 3(iv) need not be valid when 
T is a linear transformation from V n to \V m and S a linear transformation 
from to <y p . 

7. If T is a linear transformation of rank 1 from V to V, prove that 
T 2 *■ cT for some scalar c. 

8. Let T be a nilpotent linear transformation on V, so that for every 

t) e D, rfT p = 0 , but for some £T*“ l 0. 

(i) Show that {£, £T, £T 2 , . . . , £T /,_1 } is linearly independent. 

(ii) If S is the subspace spanned by the vectors of (i), show that 
crT e S for every a e S. (That is, T maps the space S into itself. Such a space 
is said to be invariant under T, or T -invariant.) 
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9. Let T be a linear transformation from into VP. A mapping T is 
defined from *U/9lr (Exercise 8, § 2.3) into VP as follows: 

(i + SRt)T = {T. 

Show that the mapping T is unambiguously defined, linear, one-to-one, and 
onto (Rt. In short, T is an isomorphism of U/3l T onto (R T . 


§ 3 . 3 . Nonsingular Transformations 

The range and null spaces of a linear transformation T have an intrinsic 
connection with the concept of nonsingularity of T, which can be characterized 
in numerous ways. The essential idea is one-to-one-ness. A nonsingular linear 
transformation is distinguished from an arbitrary linear transformation in 
the same way that an isomorphism is distinguished from a homomorphism. 
This means, of course, that the null space of a nonsingular transformation 
is [0], and therefore the mapping thus defined is reversible. 

Definition 3.7. A linear transformation T from V into VP is said to be 
nonsingular if and only if there exists a mapping T* from eftx onto 1) such 
that TT* = I, where I is the identity mapping on V. 

Although this definition does not explicitly require T* to be linear, it is 
necessarily so. To deduce this fact, let T* be a mapping from (Hr onto V such 
that TT* = I. If a, /3 G (ft t, there exist £, rj e V such that £T = a, rjT - (j . 
Then £TT* = aT* = £ and /3T* = rj. For any a, b e JF, 

(aa + 6/3) T* = (a£T + 6r?T)T* = (of + 6 t ? )TT* 

= (of + brj)l = a(aT*) + 6(ijT*). 

Thus the linearity of T and the property TT* = I imply that T* is linear. 
Furthermore, T* is uniquely determined by T if T is nonsingular. 

Theorem 3.6. Let T be a linear transformation on t) n to VP; the follow- 
ing statements are equivalent. 

(a) T is nonsingular. 

(b) For all a, /? E V n , if aT *= (TT , then a = /?. 

(c) srt = [e]. 

(d) v(T) = 0. 

(e) p(T) = n. 

(f) T maps any basis for D n onto a basis for (fty. 
proof: Our proof consists of a cycle of implications. 



56 Linear Transformations [ c h . 5 ] 

(a) implies (b). Assume T is nonsingular and that aT = /3T. Then 
(<*T)T* = (0T)T*, al = 01, and a = 0. 

(b) implies (e). If £ G Oly, then $T = 6 = 0T, so £ = 0^ Hence 2fly = [0]. 

(c) implies (d). Definition 3.0. 

(d) implies (e). Theorem 3.4. 

(e) implies (f). Theorem 3.3. 

(f) implies (a). Let {a i, . . . , a n } be a basis for 15„; then (o^T, . . . , a n T} 
is a basis for (Ry. Hence each tj g (Ry has a unique expression of 
the form r; = 2I?«i bi(a»T). Let T* be the mapping from (Ry to 15 n 
defined by tjT* = X)"- i We must show that TT* =* I on 15„. 
For each £ e 15 n , 

£ = L 

t * i 

£T = ^ 21 T = 21 fli(otiT) (r (Ry, 

(£T)T* = 21 a,a t = $ by definition of T*. 

* i-l 

Hence TT* = I on 15 n . 

Since a linear transformation is a homomorphism of 15 onto (Ry, a non- 
singular transformation is simply an isomorphism of 15 onto (Ry. This inter- 
pretation lends intuitive feeling to the statements of the preceding theorem, 
since in an isomorphism distinct vectors have distinct images, the kernel is 
trivial, and it seems entirely reasonable that dimension must be preserved. 
We remark that if T maps 15 into 15, then T is nonsingular if and only if 
(Ry = 15. If we insist on distinguishing 15 and W, then it is still true that 
T is an isomorphism between 15 and (Ry if and only if T is nonsingular. 

Interestingly enough, these observations remain valid even when 15 and W 
are infinite-dimensional spaces. Of course in that case Theorem 3.6 must be 
amended by deleting statement (e) and the subscript on 15„; otherwise, the 
theorem and most of the proof remain valid. In contrast to the definition 
of nonsingularity given here, a linear transformation T from 15 to *W is some- 
times defined to be nonsingular if and only if there exists a mapping T* 
from *W to 15 (instead of from (Ry to 15) such that TT* is the identity mapping 
on 15 and T*T is the identity mapping on *W (instead of on (Ry). The two 
definitions coincide if 15 and V? are finite-dimensional spaces of the same 
dimension, but not otherwise. (See Exercise 5.) 

The significance of Theorem 3.6 (as well as other results we prove about 
linear transformations) will become more evident when the theory of matrices 
is developed in subsequent chapters. Indeed, for every theorem we prove 
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about linear transformations there is a corresponding theorem about matrices. 
The next result is surprisingly simple to prove in terms of linear transforma- 
tions, but a matrix proof of the corresponding result is relatively obscure. 

Theorem 3.7. If T is a linear transformation from V to and if T* is 
a mapping from (R t to V such that TT* = I on 13, then T*T » Ion (Rt. 

proof : By hypothesis, T is nonsingular, so any 0 £ (Rj can be rep- 
resented uniquely as 0 = aT for some a C V , by Theorem 3.6(b). Then 
0(T*T) = (aT)(T*T) = a(TT*)T = aT = p. Hence T*T * I. 

Thus in algebraic language T is nonsingular if and only if there exists a 
transformation T* which is both a left inverse and a right inverse of T. Hence 
we call T* the inverse of T and write T -1 instead of T*. Observe that not 
every nonzero linear transformation has an inverse; for example, Ti of Exam- 
ple (c), §3.1, has a one-dimensional range space and hence is singular but 
nonzero. 

Theorem 3.8. Let V, W*, 3C, ^ be vector spaces, and let Si, T, and S 2 
be linear transformations defined, respectively, from V into *W„, from 
into 9C, and from (Rt into nj. If T is nonsingular, then p(SiT) = p(Si) 
and p(TS 2 ) = p(S 2 ). 

proof: Since T is nonsingular, dim (R Sl T = dim (R Sl ; but (R Sl T « (R 9lT , 
so p(Si) = p(SiT). Also (Ri> = (RtS 2 = (Rs„ so p(TS 2 ) = p(S 2 ). As a 
particular case, if V = *V .. = 9C = R/ and if Si = S 2 , we have p(TSi) = 
p(SiT) = p(Si) whenever T is nonsingular. 

It should be observed that (Rst and (Rs need not be equal, since T might 
map a vector of (Rs into a vector which is not in (Rs- All we know is that (Rst 
and (Rs have the same dimension. As an example, let T be the rotation of 
the plane through an angle of 45°, and let S be the projection of (x, y) onto 
(x, 0). Then (Rs is the line y = 0 and (Rst is the line y = x. 

Theorem 3.9. Let T be a linear transformation from V into and 
let S be a linear transformation from (Rt into < y. Then TS is nonsingular 
if and only if T and S are nonsingular. If TS is nonsingular, then 
(TS)" 1 * S^T- 1 . 

proof: Exercise. 

Theorem 3.10. If T is nonsingular, then T _1 is nonsingular, and 

(a) (T-i)- 1 = T, 

(b) (cT)- 1 = c~ 1 T“ 1 , if c * 0. 
proof: Exercise. 
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Exercises 

1. Provo Theorem 3.9. 

2. Prove Theorem 3.10. 

3. Any linear transformation of the plane is determined by its effect 
on the two vectors = (1,0) and e 2 = (0, 1). Suppose eiT = (a, b) and 
e-jT = (c, d). Express in terms of a, b, c f d a necessary and sufficient condi- 
tion that T be nonsingular. Interpret geometrically. 

4. Show that the set of all nonsingular linear transformations on U n form 
a group. This group is called the full linear group £„(;7). 

o. Let (V be the infinite-dimensional space of all real polynomials and let 
(P 0 be the subspace of all polynomials P for which P( 0) = 0. Consider the 
linear transformations defined as follows: 

l\x) J = J* P(t)dt for all P(x) e (P, 

/ J (x)D = £ P(x) for all P(x) e <P, 

P(x)D 0 = £ P(x) for all P(x) e <P«. 

ax 

(i) Determine the domain and range of each of the transformations 

J, D, Do, JD, DJ, JDo, DoJ 

(ii) Which of the four product transformations in (i) is the identity 
transformation on its domain 0 

(iii) Which of the seven transformations in (i) arc nonsingular? 

(iv) Explain wherein these results arc consistent with Theorem 3.7 
and Theorem 3.9. 

0. Let 8* be a A-dimensional subspace of V n . 

(i) Show that S* is the null space of a suitably defined linear trans- 
formation T from V n into V n . (Exercise 8, § 2.7.) 

(ii) Deduce that the dimension of f U»/S* is n — k. (Exercise 9, § 3.2.) 

(iii) Describe a basis for X )»/$* which is related in a natural way to a 
suitably chosen basis for *1),. 


§3.4. Dual Space 

Up to this point we have been studying linear transformations from one 
vector space *0 over J to any other vector space VP over 5, a study which 
is resumed in § 3.7. But now we digress slightly in order to investigate two 
special choices for VP; in this section and the next we let VP = CF, and in 
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§ 3.6 we let *W = D. In each case we obtain results which are peculiar to that 
situation, yet important for an understanding of general topics which will 
arise later in our study of linear algebra. A reader who prefers to move 
directly to § 3.7 may do so without any knowledge of the intervening topics, 
returning to these sections later. 

We therefore turn our attention to scalar-valued linear functions defined 
on a vector space V over Since ^ may be regarded as a vector space over 
itself, we consider the set of all linear transformations from *t) to *W, spe- 
cialized to the case where W = 5. (Exercise 7, § 2.7.) In this case a linear 
transformation T from V to assigns to each vector £ e Da scalar £T e 
of course linearity means that for all £, rj g V and all a, b G 5, («£ + brj) T = 
a(£T) + b(yT) in From Theorem 3.1 it follows that the set of all linear 
functions from V to $ forms a vector space over SI. 

Because this is a particular situation, rather than general, we shall introduce 
special terminology and notation. 

Definition 3.8. Let V be a vector space over SI. A linear transformation 
from 1) into 3F is called a linear functional; linear functionals will be 
denoted by small Latin letters, usually in boldface type. The vector 
space of all linear functionals on U is called the dual space of V and is 
denoted by V\ 

Examples of Linear Functionals 

Several important examples of linear functionals on infinite-dimensional 
spaces are familiar from our study of elementary analysis. 

(a) Let V be the space of all real-valued functions integrable for a < t < b } 

and let j be the functional defined by /j = J b f(t) dt for each f G V. Thus 

j is a linear mapping which assigns to each integrable function f a real num- 
ber /j, called the integral of / on the interval a < t < b. More generally, 
if <t> is a fixed function which is integrable for a < t < b, then h is a linear 
functional where 

fh = J a b f(tm ) dt. 

(b) Let (P be the space of all real polynomials, and let d a be defined for 
each F G ( ? by 

Pda = P'(a), 

where P ' denotes the derivative of P. Then d a is a linear functional. 

(c) Let e denote the space of all convergent sequences of real numbers. 
A linear functional is obtained by associating with each sequence the real 
number to which it converges. 

(d) If TJ is any n-dimensional space over $ and if £ G V, let £ * H?-i 
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where {<*1, is any fixed basis. Let y = 22*- 1 c,a, be any fixed vector 

of D. The mapping fy, defined by 

= C1T1 + C2X2 4 * • • • + c n x n , 
is a linear functional on D. 

The last example reveals the nature of linear functionals on V n so com- 
pletely that we repeat a description of the ideas involved. We begin with 
any basis [an, . . . , a„} for V n ; each fixed 7 e**U M determines a linear func- 
tional f 7 on V n . The scalar value ff 7 which is assigned to the vector £ by the 
linear functional f 7 is a fixed linear combination of the coordinates of £ 
(relative to the a-basis), and the coefficients of that linear combination are 
the coordinates of 7 (relative to the a-basis). As 7 is varied, we obtain various 
linear functionals. In particular, if 7 = 0, then f* maps each £ onto 0; if 
7 = ai, f y maps each £ into its first a-coordinate, and so on. We consider, 
therefore, the n linear functionals (f ai , . . . , f a „} which correspond in this 
manner to the vectors of the chosen basis, and we investigate the role they 
play in the dual space V f . 

To begin with, f a , maps a } into 1 and a, into 0 if j. Hence 
a, fa, = 8,j, for ij = 1, . . . , n, 
where the symbol 5 tJ is called the Kroncckcr delta and is defined by 
f I if i = j, 

" \0 if i 5* j , for ij j ~ 1,2,..., n. 

Next we show that . . . , forms a basis for V'. Clearly the zero 
element of V ' is the linear functional f* which maps each £ into 0. To verify 
that the f 0i are linearly independent, we suppose that 

£ c k f ak = fe. 

*«i 

Then for i = 1, . . . , n, 

0 = a,fe = I af«, = £ c k a,f a = £ c ^ik = C,. 

*-1 jl- 1 it— 1 

To verify that the f tt| span V' } we let f be any element of *0', and let a, e $ 
be defined by 

a, = a,f for i = 1, . . . , n. 

Then 

n n 

a,f = C| — 2^ cij&ij “ 

j-i 

Hence both f and 1 a,f B are linear functionals on V, whose values coincide 
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on the basis {aj, . . . , a n }. Linearity then implies that these values coincide 
on all of D ; that is, they are equal elements of 1)'. We have therefore proved 
the following theorem. 

Theorem 3.11. Let {<*!, . . . , a„} be a basis for D„, and let fj be the 
linear functional defined on Dn by prescribing 

«ifj = for i = 1,2 , . . . , m. 

Then {fi, . . . , f n } is a basis for Di. Hence Dn is n-dimensionaL 

The basis for D' which is described in Theorem 3.11 is called the dual basis 
of {ai, . . . , a n } . 

Theorem 3.12. If £ is any nonzero vector of D n , there exists a linear 
functional f e D„ such that £f ^ 0. 

proof : Let {«i, . . . , a n } be any basis for D n , and let {fi, . . . , f n } be 
its dual basis for Dil. Let £ = £?_! x,a l . If £ 0, there exists at least 
one index j for which 5^ 0. Then £f, 5^ 0. 

It follows immediately that if f ^ 77 in D„, then for some f e d£ £f j* ryf ; 
we shall make use of this observation in Theorem 3.13. 

But first we return to the general case in which D is not specifically assumed 
to be finite-dimensional. Since D' is a vector space, it has a dual space of 
its own which is denoted (D')' or simply D". D" is called the second dual 
space or the bidual space of D. Thus any element f of D' can be given two 
interpretations; sometimes we wish to regard it as a linear mapping from 
D to JF, while at other times we wish to regard it as a vector which is mapped 
into (F by each element of D". In order to distinguish between these two roles 
of the elements of D', we shall modify our notation slightly, writing f for an 
element of D' regarded as a mapping from D to JF, but writing / for the same 
element of D' regarded as a vector which is mapped into £F by an element 
of D". An element of D" will be denoted by x. Thus we have { e D, f or 
/ e D', and x e D"; then £f e JF and fx e SF. 

Now we turn our attention to an important correspondence between the 
elements of a vector space D and some of the elements of its bidual space, D". 
Let £ be a fixed vector of D. For each f e D', £f e SF; this means that £ can 
be used to attach a scalar value to each f € D' — in other words, £ determines 
a mapping from D' into SF. Furthermore, that mapping is linear on D', because 

£(afi + bf 2 ) = a£fi + 6£f 2 . 

Thus with each £ e D we associate the linear functional x* e D" which is 
defined by 


fx 1 « £f for each f e D'. 
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Furthermore, this correspondence or mapping from V to V " is linear (a vector 
space homomorphism), since if & corresponds to xi and {2 to x 2 , then for each 
f el)' and all a, b e ;T, 

(a£i + /^2)f = afif + ftfsf = a/xi + bf \ 2 = /[axi + bx 2 ]. 

Ifence *0 is homomorphic to a subspace of *0". We naturally wonder whether 
the homomorphism is an isomorphism (that is, whether the mapping from V 
to D" is one-to-one), and if so whether V is isomorphic to the full space D" 
(that is, whether the mapping is onto U" rather than into D"). The answers 
to these questions are that the mapping is an isomorphism whether *U 
is finite-dimensional or not, but the mapping is onto V" if and only if *0 is 
finite-dimensional. However, we shall prove these assertions only for the 
finite-dimensional ease. 

Theorem 3.13. Let V be an n-dimensional vector space. Each £ e *0 
determines a mapping x* from V' into J, defined by /x* = £f for all 
f el)'. Then \t e U", and the correspondence £ — *- x* is an iso- 
morphism of *0 onto TJ". 

v hoof : Our previous remarks show that the given correspondence is 
a homomorphism of V onto a subspace W" C *0". By Theorem 3.12, 
if £1 5^ £2 in V, then for some f e *1)', £if 5* £ 2 f; thus /xi ^/x 2 , and 
xi ^ x 2 in V By Theorem 3.11, V' is n-dimeusional, and therefore so 
is V". If {ai, . . . , a n ) is a basis for V and if a t — *- x t , then {xi, . . . , x n } 
is a basis for W", so *W" = *0". 

Exercises 

1. (liven c 5* 0 in fF and f 5* 6 in V' f show that there exists £ 5^ 6 in V 
such that £f = c. 

2. In 83 the scalar or dot product of two vectors a = (ai, a 2 , a 3 ) and 
fi = (hi, 62, 63) is defined by 

a • fi = a\bi + a 2 6 2 + 0363. 

(i) Given any linear functional f on 8*, let fit *= (eif, ^f, c ft f). Show 
that af = a -fit for every a e 83. 

(ii) Conversely, given any fi G £3, let be the linear functional defined 
by = b„ i = 1, 2, 3. Show that af$ = a-fi for every a e 83, and that 
f 0 = 6 in 83 if and only if fi = 6 in 83. 

3. In 8 3 the following vectors form a basis: 

<*1 = (0, 1, 1), 

( 1 , 0 , 1 ), 

= (1> 1, 0). 



[ § 3.5 ] Transpose of a Linear Transformation 63 

Determine a basis {fi, f 2 , f 3 } for Si which is dual to {a h a 2 , a 3 } , and compute 
(xi, x 2 , x 3 )f, for each i. 

4. Let V be a vector space over $ and let S ^ $ be a subset of *0. The 
annihilator of S is defined to be the set of all linear functionals on V which 
map each vector of S into 0: 

S° « {f e *0' | <rf « 0 for each a e S}. 

Prove the following properties of annihilators. 

(i) If S C T in. V % then T* QS*wV' 

(ii) If 311 is a subspace of *0, 3m° is a subspace of V f . Furthermore, 
if D has dimension n and 311 has dimension m, then 311° has dimension n — m. 

(iii) If *U is finite-dimensional, the mapping of Theorem 3.13 is an 
isomorphism from 3TC to (Oil 0 ) 0 . 

(iv) If V is finite-dimensional and if 3TC and 31 are subspaces of *0, then 

(3 n n 3i)° = 3F° + 3i°, 

(3TC + 31) 0 = 311° n 31°. 

5. Let *13 „ be a vector space over 3, and let f be a fixed, nonzero linear 
functional on Show that 3C = {£ e V n | £f =» 0} is an (n — l)-dimensional 
subspace of e U n . [In geometric language any (n — 1 )-dimensional subspace of 
an n-dimensional space is called a hypcrplane.] 

6. Let (P be the infinite-dimensional space of all polynomials with real 
coefficients, and let S be the infinite-dimensional space of all infinite sequences 
of real numbers. With each real sequence {<%} we associate the mapping 
f e CP', which is defined from (P to the real numbers by specifying that f is 
linear and that x k i = c k for k = 0, 1, 2, ... , Show that (?' is isomorphic 
to S. (Theorem 3.11 shows that t) and V' are isomorphic when V is finite- 
dimensional. This example shows that the corresponding result is not valid 
for infinite-dimensional spaces, since it is known that (P and S are not iso- 
morphic.) 

§ 3 . 5 . Transpose of a Linear Transformation 

Next we shall show that each linear transformation T from a vector space V 
to a vector space is associated in a natural way with a linear transforma- 
tion T' from the dual space to the dual space 1)', as indicated in Figure 3.4. 
T' is defined as the transformation from to D' which maps each linear 
functional g on *W into the linear functional gT' on U, the value of gT' for 
each a € V being the scalar (aT)g. Thus in 3 

a(gT') « (aT)g for each a e V and each g e W'. 
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V'*— VP' 

V 

Figure 3.4 

This condition can be expressed as an equation in V 
gT = Tg for each g g VP'. 

Since T is a linear mapping from V to VP and g is a linear mapping from VP 
to the composite mapping Tg is linear from V to Hence gT G for 
each g g VP', and T' is indeed a mapping from VP' to D'. As an exercise you 
may verify that T' is linear. 

Definition 3.9. Let V and VP be vector spaces over and let T be a 
linear transformation from V to VP. The transpose T' of T is the linear 
mapping from VP' into D', which is defined by 

gT' = Tg for each g G VP'. 

As another exercise you may verify that I' = I, (cT)' = cT' for each 
c G and (S + T)' = S' + T' if both S and T are linear transformations 
from V to VP. The equation I' * I is interpreted as meaning that if I is the 
identity mapping on V, then I' is the identity mapping on V'. 

Theorem 3.14. Let T be a linear transformation from V to VP and S a 
linear transformation from VP to t y. Then 

(TS)' = S'T'. 

proof: Since TS is a mapping from V to *y, (TS)' is a mapping from 
*y' to V'; S' is a mapping from t y' to VP' and T' a mapping from VP' to V'. 
For each A g *y', 

A(TS)' = (TS)h 
= T(Sh) 

= T(AS') 

- (AS')T' 

= A(S'T'). 

The key observation in the preceding proof is that AS' G VP f , say AS' = g . 
Then, by the definition of T', 

T(AS') - Tg * gT = (AS')T'. 
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By applying Theorem 3.14 to the case in which T « S“ l , you may prove 
that if S is nonsingular, then S' is nonsingular and 

(S')" 1 - (S -i)'. 

Theorem 3.15. Let T be a linear transformation from V m to *W n . Then 
T and T' have the same rank. 

proof: Let p(T) = k ; then (Rt is a ^-dimensional subspace of *W n . 
By Exercise 4(ii), § 3.4, the annihilator (Rt of (Rt is a subspace of 
of dimension n — k. 

Then we have 

/ G (Rt if and only if rj{ = 0 for every 17 e (Rt, 

if and only if (£T)f = 0 « f(/T') for every { e V m , 
if and only if /T' = 6 in V' m9 
if and only if f s 3Kt'. 

Thus (Rt = and y(T') = n — k. Since T' is a linear mapping 6f 
p(T') = k - p(T). 

Exercises 

1. Prove that the transpose of a linear mapping is linear. 

2. Prove that I' = I, (cT)' = cT', and (S + T)' - S' + T'. 

3. Prove that if S is nonsingular, then S' is nonsingular and (S')" 1 » (S" 1 )'. 

4. If T is a linear transformation from V to state precisely what is 
meant by (T')'. If V and *W are finite-dimensional, use Theorem 3.13 to 
interpret the equation (T')' = T, and then prove it. 


§3.6. Linear Algebras 

In § 3.1 we observed that a special situation arises when we consider the 
system, of all linear transformations of a vector space V into V itself. 
£ i# a ^yctor space in which the “vectors” are linear mappings of *0 into V; 
therefore a product S □ T of “vectors” can be defined by means of the 
operation of successive mapping ST. In general, a vector space over $ in 
which a suitable product of vectors is defined is called an algebra over 
there are various types of algebras, classified according to the properties of 
that product. 

Definition 3.10. A linear algebra £ over a field $ is a system 
£- {L,F;+, •,©, 0,0} 
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which satisfies the postulates: 

(a) the system {L, F; +, •, ©, ©} is a vector space over ft, 

(b) □ is a binary operation on £ which is closed, associative, and 
bilinear. 


This second postulate requires elaboration, but first we agree to dispense 
with the special notation. Then (b) simply asserts that for all a, fr e ft and 
all Ti, T 2 , T a e £ the product operation is 


Closed: TiT 2 e £, 

Associative : Ti(T 2 T») = (T 1 T 2 )T 3 , 

(T y (aT 2 + />T :} ) = aTjT 2 + 6T,T, f 
l(aT, + V T z )Ty = aT/Ti + 6T/IY 


Bilinear: 


The dimension of £ is defined to be its dimension as a vector space. 


Theorem 3.16. The system £ ot all linear transformations from a vector 
space V over ft into V is a linear algebra over ft. If V is of dimension n , 
then £ is of dimension n 2 . 

proof: We have already verified that £ is a vector space over ft and 
that the product of linear mappings is linear and associative (Theorem 3.1 
and Exercise 9 of § 3.1). Hence we need only show that the product is 
bilinear: 

<*[Ti(aT 2 + bT,)] = (aTi)(aT 2 + bT,) 

= (aTiXaTO + (aTxKbTt) 

= a(aTi)T 2 + 6(aTi)T 3 
= a(aTiT 2 ) + b(aTiT 3 ) 

= aiaTxTt) + a(6T x T 3 ) 

= a(aT x T 2 + 6TiT 3 ). 

A similar calculation verifies the second condition of bilinearity. Thus £ 
is a linear algebra. To prove that the dimension of £ is n 2 we use The- 
orem 2.14 to represent V n as n-tuples of elements of ft, and dfefin# the 
n 2 linear transformations T„, where i,j = 1 , . . . , n, by 

(x h . . . , x n )T tj = (0, . . . , 0, x l} 0, . . . , 0) 

where thcjth component of the image vector is x t , and all other compo- 
nents are zero. It can be shown that the T tJ are linearly independent 
linear transformations which span £. However this fact is more readily 
seen in terms of matrices, so the completion of this proof is deferred 
until § 4.3. 
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One additional remark is of interest here. An abstract linear algebra may 
or may not have an identity of multiplication. However, it can be shown that 
any linear algebra with an identity and of dimension k is isomorphic to a 
subalgebra of the algebra of all linear transformations on D*. This fact pro- 
vides a concrete representation of any such abstract linear algebra and a 
striking illustration of the generality of linear transformations and their 
importance in linear algebra. 

Exercises 

1. Let a be a linear algebra over $ and let {a lf . . . , a m } be a basis for a. 
The product of any two elements of Cl is an element of a and hence is a 
linear combination of the a t . Hence each pair a lf aj of basis vectors deter- 
mines m scalars c k — 1, . . . , m such that 

TO 

cuoLj = £ Cinfitk for all i y j = 1, . . . , m. 

*-i 

(i) Show that the product of any two elements of Ct is determined 
by the m 8 scalars c i; *. 

(ii) Find a necessary and sufficient condition on the scalars c,/* that 
tne algebra be commutative (£r/ = for all £, tj e a). 

(iii) Show that any finite-dimensional vector space can be made into a 
linear algebra by defining the trivial product in which = 0 for all i, j, k. 

2. An important example of a linear algebra of dimension four, given a 
century ago by Hamilton, was a forerunner of the study of matrices. The 
elements of the algebra are called quaternions , and the scalars are the real 
numbers. In a notation similar to that of the complex numbers, a quaternion 
is an expression of the form 

ail + a 2 i + a 3 j + ajc. 

Equality, sum, and scalar multiple are defined component by component; 
quaternion product is defined by bilinearity and the following multiplication 
table for the basis elements, wherein the product xy appears in the row 
labeled x at the left and in the column labeled y at the top: 



1 i 

j 

k 

1 

1 i 

j 

k 

i 

i —1 

k 

~j 

j 

j -k 

-1 

i 

k 

k j 

—i 

-1 

(i) Verify that this product is closed. Ail other postulates of a linear 
algebra are also satisfied. 
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(ii) Show that every quaternion except 0 + Oi + Qj + Ok has an 
inverse relative to this product; that is, 

(ai + a^i + a^j + cak)(bi + + b$j + b^k) = 1 + Oi + Oj + Ok 

for suitable &i, b 2 , b h b 4 . 

(iii) Is the product commutative? 

From this we conclude that the quaternions form a noncommutative 
“division algebra.” An important theorem of Frobenius proves that the 
quaternions form the only noncommutative division algebra over the real 
numbers. 

§3.7. Specific Form of a Linear Transformation 

Finally we approach the bridge which connects linear transformations and 
matrices. Let T be a linear transformation from into *W w . We choose any 
basis {«i, . . . , a m } for *U m and any basis {ft, . . . , /3 n } for V? n , and then 
consider the images a x T of the basis vectors of V mt expressed in terms of the 
basis vectors of W*. Since a,T G W n for each i, there is a unique representa- 
tion for a,T as a linear combination of the vectors in the /3-basis; that is, 

a,T = a, i0i + a <2 ft> + • • • + a in 0n 

n 

= H a u0j> for each i = 1, 2, . . . , m. 
j-i 

This set of m linear equations describes the effect of T on the vectors of the 
a-basis. But any vector £ e V m can be represented uniquely as a linear 
combination of the vectors in the a-basis: 

m 

£ = ^ X x Oli. 

i-l 

Hence 

{T = ( fw) T = £*.(«.' T) 

m/n \ m/n \ 

= E x,[ E a l} M = E ( E x.a,A ) 

1-1 \ j - 1 / 1-1 \; — 1 / 

n / m \ 

= E (E *.<*./) Pj. 

Therefore the effect of T on any vector £ is described in terms of the scalars 
(which represent £ in terms of the a-basis) and the scalars a l} (which describe 
the effect of T on the vectors of the a-basis, expressed in terms of the /3-basis). 

This representation of T is so vital in our future work that we repeat, 
for emphasis, the idea involved. Given a linear transformation T from 
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to we choose a basis for V m and a basis for 'W*. The image of each of 

the m basis vectors of V m is described by n scalars which depend upon T 

and upon the basis for Furthermore, these mn scalars describe T com- 
pletely since the image of any given vector can be determined from the 
mn scalars. 

Conversely, with respect to a fixed choice of bases for 13 m and \V\,, any 
mn scalars determine a linear transformation; given i = 1,2 % ... 9 m and 
j = 1, 2, . . . , n, let T be the linear transformation defined by the equations 

aiT = an/3i + OiA + • • • + a\ n fS n 

oLi T = + anfii + • • ■ + a^n 


OtmT a m ”|~ dniZ$2 “f” ' * ’ “t” a mn firi' 

Since a linear transformation is determined by its effect on any basis (Exer- 
cise 7, §3.1), £T is defined for all £ e 13. These results are summarized in 
the following theorem. 

Theorem 3.17. Let T be a linear transformation from D m to *W\,, both 
over £F. With respect to a chosen pair of bases, T determines a set of 
mn scalars, arranged in a rectangular array of m rows and n columns. 
Conversely, each such array of mn scalars determines uniquely (using 
the convention just described) a linear transformation from 13 m to W». 

In the next chapter we shall consider rectangular arrays of scalars, calling 
each such array a matrix. Theorem 3.17 tells us that every linear trans- 
formation determines an array of this type, relative to fixed bases, and every 
array determines a linear transformation, again with fixed bases being used 
to define the transformation. In order that these arrays be useful in repre- 
senting facts about linear transformations, wc shall define operations on 
matrices in such a way as to imitate the corresponding operations on linear 
transformations. 

For linear transformations which map 13 into itself, only one basis is 
needed. To illustrate, let us consider the representation of the product of 
two linear transformations T and S from 13 to 13. Let [an, . . . , a„] be a 
basis for 13. Then, using the method just described, we obtain ri 1 scalars a lk 
which represent T and n 2 scalars b kJ which represent S, both in terms of the 
a-basis: 

n 

ttjT = ^ a lk 0 L k) i — 1, . . . , ri t 

Jb-l 

n 

ttiS h k jOtj } Jc = 1, , . • | 7h, 

i- 1 
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To find a representation of TS we calculate as follows: 

a t (TS) = (a»T)S = ( E <*>*<**) S = E <*,*( a k S) 

\*-i / fc-i 

= E E 

Ac-1 \J-1 / 

= E ( 51 aikbicjotj) 

Jb-l \;-l / 

= E ( E a lk b kj a 3 ) 

3 - 1 U-l / 

= E « A,) ay. 

The last form of the equation can be written 
a,(TS) = ^^E aAij «i + Qikbk^J at + 


+ 


^ ^ G'tkbkn^ 


a n 


for i = 1, . . . , Hence the n 2 scalars which represent TS relative to the 
a-basis are 

rt 

E aJ>kj, i y j = 1, 2 , ... } n. 


*- i 


This observation is precisely what guides us in the next chapter, where the 
product of matrices is defined formally. 

We now return to the simpler case of representing a single transformation 
relative to a chosen basis. To illustrate how the a l3 depend on the chosen 
basis as well as on T, let us consider the rotation of the plane through an 
angle of 90°. Referring to the usual coordinates, choose *! = (1,0) and 
€2 = (0, 1) as a basis. Then 

«iT = (1, 0)T = (0, 1) = 0*i + 1* 2 , 

6 2 T = (0, 1)T = (-1, 0) = -1*1 + 0*2, 
and the four scalars which represent T are 


flu — 0 fli2 — 1 


CL 21 ~ — 1 ^22 — 0. 

Now let us choose a different basis, say a\ = (1, 1) and a 2 = ( — 1, 0). Then 
axT= (1, 1)T = (—1, 1) = 1«!+ 2a 2 , 

a 2 T = (— 1, 0)T = (0, — 1) — — 1«! + (-l)a 2 , 
and the four scalars which represent T with respect to the new basis are 

bn = 1 b\2 = 2 

i>2i = —1 622 = — 1. 
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In our thinking about linear transformations we have hitherto used the 
phrase “image of a under T” to describe aT. Using the plane for simplicity, 
the geometric picture we have of T is that T performs a rearrangement of 
the vectors (points) of the plane. Each point is moved by T into a new 
position (which might happen to coincide with the old one). In case T is 
nonsingular, a second means of interpreting T is sometimes convenient. If 
T is nonsingular, the image of a basis is again a basis. But in Chapter 2 we 
saw that each basis determines a coordinate system, so a change of basis is 
nothing more than a change of coordinates. Thus we may picture the points 
of the plane not being moved about by T but remaining fixed and acquiring 
a new set of coordinates. 

To summarize, a nonsingular transformation may be interpreted in two 
ways: 

Dynamic (Alibi). T moves each point P into a point PT in such a way that 
if P Q then PT ^ QT. 

Static (Alias). T assigns new coordinates to each point in such a way that 
if the old coordinates of P and Q are different, then the new coordinates of 
P and Q also differ. 

For the example given above, the dynamic interpretation is illustrated by 
Figure 3.5. Alternately, in the static interpretation we consider the four pairs 



.i 


a 2 r 


Figure 3.5 


of vectors as defining four coordinate systems in the plane. If a point P has 
coordinates (a, b) in the «i, e 2 basis, then by direct computation P has 
coordinates 
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(6, — a) in the «iT, basis, 

(6, b — a) in the <x h a 2 basis, 

( — a, —a — b) in the aiT, <* 2 T basis. 


Exercises 

1. A linear transformation T of the plane is known to carry the point 
P(l, 1 ) into the point P'( — 2 , 0) and to carry Q( 0, 1) into Q'( — 1, 1). 

(i) Choose {«i, e 2 } as a basis, and determine eiT and ^T. 

(ii) What scalars represent T relative to {«i, e 2 } ? 

(iii) What is the image under T of an arbitrary point R(a, &)? 

(iv) Choose the basis vectors {ft, ft} to be the points P and Q. What 
scalars represent T relative to {ft, ft} ? 

(v) Show that T is nonsingular, and find scalars which represent T* 1 
relative to {«i, c 2 } . 

2. Let «i = €i + 2 « 2 , ot 2 = — 6i + c 2 , and let S be the linear transforma- 
tion determined by this change of basis; that is, *iS = a h e 2 S = a 2 . 

(i) What scalars represent S relative to {«i, e 2 } ? 

(ii) What scalars represent S relative to {«i, a 2 } ? 

(iii) What scalars represent S* 1 relative to {ei, €2}? 

(iv) Find the image under S of the vector ae 1 + be 2 . 

3 . Referring to Exercises 1 and 2 for the definitions of S and T, consider 
the product transformations TS and ST. 

(i) What scalars represent TS relative to {ei, €2}? Answer in two 
ways: first by determining eiTS and e 2 TS, and second by applying the for- 
mula c %J = Ll-i dikbkj of § 3 . 7 . 

(ii) Similarly, calculate the scalars which represent ST relative to 

H €*}• 

(iii) Find the images of the point (a, b) under the transformations TS 
and ST. 

4. Consider a linear transformation T of D m onto W n and a linear trans- 
formation S of *W n onto y p . Choose bases {ai, . . . , a m } for *U m , {ft, . . . , ft} 
for *W n , and {71, . . . , 7*} for y p . Then T is represented relative to the 
a-basis for V m and the ftbasis for W n by the mn scalars a,*, where 

n 

a» T = £ a,*ft, t = 1, . . . , m. 

ife-i 

Likewise, S is represented relative to the ftbasis for W n and the 7-basis for 

by the np scalars , where 
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&S * £ bktfj, k ■ 1, . . . , n. 

j-i 

Show that the product transformation TS of 13 * onto is represented rela- 
tive to the a-basis for D* and the 7-basis for ‘tjp by mp scalars: 

n 

c„ - E tikbkj for i = 1 , . . . , m and j = 1, . . . , p. 

fc- 1 

5 . (i) Prove that T is idempotent if and only if 17T = 17 for every 17 e ®t- 

(ii) Prove that if T is idempotent, there exists a basis for 13 such that 
a.T = a, for 1 < i < p(T) and a,T = 6 for p(T) < i < n. 

(iii) What array of scalars represents an idempotent transformation 
relative to the basis described in (ii)? 



CHAPTER 4 


Matrices 


§4.1. Matrices and Matrix Operations 

The considerations of the last section show that each linear transformation 
T from D« to \V„ determines an array of mn scalars which describe T com- 
pletely. The determination of these scalars requires the choice of a basis for 
D m and a basis for V? n , and various choices of bases lead to different sets of 
scalars, each set representing T relative to a suitable pair of bases. A funda- 
mental method of linear algebra is to investigate the nature of T by observing 
properties of an array of scalars which represent T, and vice versa. 

Let Dm be a vector space with an arbitrary but fixed basis {<* 1 , . . . , a m } . 
Let T be a linear transformation of D m into a vector space and let 
{/9i , . . . , fin) be any fixed basis for W*. For each t = 1, 2, . . . , m, a,T is a 
uniquely determined vector of and hence is uniquely represented as a 
linear combination of the p„j = 1 , 2 , . . . , n: 

<*lT = Ondl + dliSl + ’ • ' + dlnfin, 

«sT = OtiSl + + • • • + OinSn, 

do ; ; ; ; 

O m T = OmlPl + UmA + ■ • • + Omni S„. 

Notice the meaning of the subscripts : the first subscript i of a„- means that 
an is one of the coefficients of the representation of the vector aiT relative to 
the /8-basis, and the second subscript j of a„ means that o„ is the coefficient 
of |8, in that representation. Relative to the two bases, T is completely de- 
termined by the mn scalars a,j, together with this interpretation of the mean- 
ing of the subscripts. This means not only that we have selected bases for 
Dm and *W n , but that we have written the basis vectors in a specific order in 
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each case, and we tacitly agree to observe this order. If we were to inter- 
change the order, say of ai and a 2 , then we would merely interchange the 
first two lines of (4-1), whereas an interchange of ft and ft would interchange 
the first two columns on the right hand side. The mn scalars representing T 
would be the same, but they would be written in a different arrangement 
from that of (4.1). To avoid ambiguity, therefore, we agree to pay attention 
to the order of the basis vectors. 

With this convention understood, we can dispense with writing the a t and 
ft, and represent T by the rectangular array of scalars, 



This array of m rows and n columns of field elements is denoted more com- 
pactly by (at/), i = 1, . . . , m and j = 1, . . . , n. The component a tJ is the 
scalar in the ith row and the jth column, so the first index i is called the row 
index , and the second index j is called the column index. 

Before we proceed it is appropriate to emphasize that we have agreed to 
represent T by the set of scalars arranged exactly as they appear as coefficients 
in (4.1). This convention is adopted in order that the algebraic properties of 
such arrays reflect corresponding properties of the linear transformations 
which they represent, and it is closely related to our earlier decision to use 
right-hand notation for linear transformations. Had we chosen to use left- 
hand notation, writing T({) instead of £T, we would represent T by a different 
array of scalars, namely 



This new array is obtained by interchanging the roles of rows and columns in 
the previous array. That is, in left-hand notation the first column of scalars 
specifies the coefficients of the vector T(«i), the second column those of 
T(a s ), and so on. In right-hand notation the first row specifies the coefficients 
of aiT, the second row those of a 2 T, and so on. 

Definition 4.1. A rectangular array containing m rows and n columns 
of elements of a field is called an m X n matrix over ff. 
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More completely, a matrix is an element of an abstract system for which 
several relations and operations are defined, and a study of this system 
comprises matrix theory. In defining relations and operations for matrices 
we shall be guided by the principle that if matrices are to be used to represent 
linear transformations, their algebraic properties must reflect those of linear 
transformations. 

First let us consider equality; two linear transformations are equal if and 
only if they have exactly the same effect on each vector, and therefore if 
and only if they have the same effect on each vector of a basis. The latter 
means that two equal linear transformations have identical matrix repre- 
sentations relative to a fixed choice of ordered bases for V m and W*. 


Definition 4.2. Two rn X n matrices A = (a tJ ) and B = (6 XJ ) are equal if 
and only if a tJ = b tJ for every i = 1,2,..., rn and every j = 1 , 2, . . . , n. 

The matrix representations of the transformations T -f S and cT, expressed 
in terms of the representations of T and S, provide the motivation for the 
following two definitions. 


Definition 4.3. The sum of two m X n matrices A — (a*,) and B = ( b l} ) 
is the rn X n matrix C = (c XJ ), where 

Cij = Qij “I - b x j 

for every i = 1 , 2, . . . , m and every j = 1 , 2, . . . , n. 

Definition 4.4. The scalar multiple of an m X n matrix A = (a tJ ) by 
a scalar c is the m X n matrix C = (c* ; ), where 

Cxj = ca <i 

for every i = 1 , 2, . . . , m and every j = 1 , 2, . . . , n. 


Notice in particular that equality, sum, and scalar multiple are all defined 
component by component. Also note that equality and sum are defined only 
for matrices of the same dimensions. An example for 2 X 3 matrices will 
make the definitions clear. Let 


Then 


and 



3A 
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Since addition and scalar multiplication of matrices are defined in terms 
of the field operations, we may expect certain properties of the field operations 
to be inherited by matrices. For example, matrix addition is associative and 
commutative, and scalar multiplication is distributive over matrix addition. 
These properties, and others listed below, may be proved as exercises. 

1. A + B = B + A. 

2. (A + B) + C = A + (B + C). 

3. c(A + B) = cA + cB. 

4. (c + d)A = cA + dA. 

In particular, the m X n matrix Z, all of whose components are zero, is the 
identity of addition for all m X n matrices. 

5. A + Z = A. 

6. A + (-l)A = Z. 

7. 0(.4) = Z. 

Next consider how matrix multiplication should be defined in order to 
simulate the product of linear transformations. The product TS was defined 
as a successive mapping, and this is possible only when T maps into 
and S maps into f y p . The corresponding matrix product AB should be 
defined whenever the number of columns of A equals the number of rows of 
B , since each represents the dimension of W„. The exact form of the multi- 
plication is indicated by Exercise 4, § 3.7, which calculates the representation 
of TS in terms of bases for D m , W„, and y p . 

Definition 4.5. If A = ( a lk ) is an m X n matrix and if B = (b kJ ) is 
an n X p matrix, the product AB is the m X p matrix C = ( c tJ ), where 

n 

C*y = X] Qikbkj 

km 1 

for every i = 1 , 2, . . . , m and every j = 1 , 2, . . . , p. 

Some of the time we shall be concerned with transformations of a space 
into itself. The corresponding matrices will be square, n rows and n columns, 
if n is the dimension of the space. Thus if A and B are square matrices of 
the same dimension, then AB and BA are both defined, but not necessarily 
equal. 

We refer to Definition 4.5 to learn a technique for matrix multiplication. 
Each element in the product AB of an rn X n matrix A by an n X p matrix B 
is the sum of n products of scalars. To find the element c t) which is in the 
ith row and jth column of AB we mult iply each element in the ith row of A by 
the corresponding element in the jth column of J3, and then add: 
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C\j Club i j ”f“ d^b^j • -[• Qinbnj* 



In practice we can perform this computation easily by the technique of using 
the left index finger to run across the ith row of the left-hand matrix and 
simultaneously using the right index finger to run down the jth column of 
the right-hand matrix, multiplying elements in corresponding positions and 
adding successively the products obtained. An example may help to clarify 
the procedure. 



/(D(0) + <0)(0) + ( — 1)( — 2) (1)(1) + (0)(4) + (-1)(3)\ 

AB = (2) (0) + (4)(0) + (7) ( — 2) (2)(1) + (4)(4) + (7)(3) ) 

\(5)(0) + (3) (0) + (0) ( — 2) (5) (1 ) + (3) (4) + (0)(3) / 



Exercises 


1. Verify properties 1 through 7, page 77, for addition and scalar multi- 
ples of matrices. Deduce that the set of all m X n matrices over a field S forms 
a commutative group relative to addition. 

2. Compute AB , AC, B 2 , BC, CA, given that 


A = 


p 

-1 

4 \ 

/ ° 

2N 

1 

0 

-u 

c-l-i 

0 

\o 

3 

l/ 

\ 3 

1J 


Are any other binary products possible for these three matrices? 

3. Referring to Example (c) of § 3.1, recall that {*i, « 2 } is the basis chosen 
for S 2 . 


(i) Find the matrices A, B, and C which represent Ti, T 2 , and T 3 , 
respectively. 

(ii) Calculate AB, BC, CB, and A 2 , and from the results deduce state- 
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ments for matrices which are analogous to the statements observed in the 
discussion of these linear transformations. 

4. (i) Relative to the preferred basis {*i, . . . , for 6 n , find the ma- 

trix E 13 which represents the linear transformation T t „ where fo, . . . , x n )T, ; 
is the n-tuple with x, in position.; and 0 elsewhere. 

(ii) Show that any n X n matrix is a linear combination of the 
matrices E, } , for i, j = 1, . . . , n. 

(iii) Show that if a linear combination of the matrices E tJ equals Z 
then each coefficient of that linear combination is zero. 

5. Show that the system of all 1 X 1 matrices over a field iT, together 
with matrix addition and multiplication, is a field which is isomorphic to $. 

6. Prove that the set of all real 2X2 matrices of the form 

(-; :) 

forms a system which is isomorphic to the field of complex numbers. 

7. Prove that the set of all complex 2 X 2 matrices of the form 

( n + C + i,l \ -2 _ _ 1 
\ — c + id a — ib/ ’ 

forms a system which is isomorphic to the algebra of quaternions as described 
in Exercise 2, § 3.0. 

8. Read “What is a Matrix?” by C. C. MacDuffee, American Mathe- 
matical Monthly, volume 50 (1043), pp. 300-305. 


§4.2. Special Types of Matrices 

Before proceeding with a general study of matrix operations, we shall 
consider certain classes of matrices which play an important role in matrix 
theory. The few calculations we have made in the examples and exercises 
have warned us to expect the unexpected; from Exercise 3, § 4.1, we see that 
matrix multiplication is not commutative (even when both AB and BA are 
defined) and also that a product of nonzero matrices can equal the zero 
matrix. 

Identity znatrix. We seek a matrix 1 such that IX = XI = X for every 
matrix X. But if X is rn X n, I must have m columns in order that IX be 
defined and n rows in order that XI be defined. But then IX is n X n, XI 
is m X rn, and X is rn X n. Hence m and n must be equal, so X must be 
square, say n X n, and I must be likewise. Thus we must speak of the identity 
matrix of dimension n , which is easily seen to be 
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(*u) 


r 100. 
0 10 . 
0 0 1 . 


^0 0 0 ... 1 , 

where 5 tJ is the Kroncckcr delta , defined in § 3.4. 


Scalar matrices. Although matrix multiplication is not commutative, the 
n X n identity matrix commutes with every n X n matrix. Are there other 
square matrices A = ( a XJ ) which have the property that AX = XA for every 
n X n matrix A" = (, x „)? A straightforward approach would be to determine 
scalars a tJ which satisfy the n 2 equations, 

n n 

Q'lkXkj = EikQ’kjj ij 3 = 1 7 f n. 

*-l 

This is a somewhat fearful task. Instead, we argue as follows: let E ra be 
the n X n matrix with c r8 = 1 and e l3 = 0 if i r or j 9 ^ s. If A commutes 
with all n X n matrices, then in particular we must have 



where every clement not in column s of the first matrix is zero, and every 
element not in row r of the second matrix is zero. Hence we have a rr = a,*, 
a XT = 0 if i 5 ^ r, and a SJ = 0 if j s. Thus if A commutes with all the E r * 
matrices, A must have the same element k in every position of the main 
diagonal and zeros elsewhere : 



Such a matrix is called a scalar matrix , being merely a scalar multiple of /. 
Clearly, a scalar matrix commutes with every matrix, so the question is 
answered completely. 

Diagonal matrices. Scalar matrices form a subclass of the class of diagonal 
matrices , which are defined by the property that a, 7 = 0 if i j* j. Thus zeros 
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appear everywhere except possibly on the main diagonal. Clearly, the sum 
of diagonal matrices is diagonal; so is the product, for if A and B are diagonal 
and AB « C, then 



Triangular matrices. A still more inclusive class of square matrices is that 
for which a„ = 0 whenever i > j. Such a matrix is called (upper) triangular 
because all the nonzero elements lie on or above the main diagonal: 


All Ol2 . 
0 Q 22 . 


0 0 



A triangular matrix for which a ti = 0 for i = 1, . . . , n is called strictly tri- 
angular . Clearly, any triangular matrix is the sum of a strictly triangular 
matrix and a diagonal matrix. 


Idempotent matrices. A matrix A is said to be idempotent if and only if 
A 2 = A. An example other than Z and / is 

Nilpotent matrices. A matrix A is said to be nilpotent of index p if A p = Z 
but A p ~ l 7 * Z. Any strictly triangular matrix is nilpotent. (See Exercise 5.) 

Nonsingular matrices. This special type of matrix is of great importance, 
since it corresponds to nonsingular linear transformations. We recall from 
Theorem 3.6 that a nonsingular linear transformation T from V into is 
simply a one-to-one linear mapping of V onto (Rt. Hence, by regarding T as 
a linear transformation from 1) to (Rt, we can represent T by a square matrix. 



Definition 4.6. An n X n matrix A is said to be nonsingular if and 
only if a matrix B exists such that 

AB = I. 


Otherwise A is said to be singular. 
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First we point out the formal similarity of Definition 4.6 to Definition 3.7 
which concerned nonsingular linear transformations. For transformations 
we proved in Theorem 3.7 that if TT* = I then T*T = I. Is the corre- 
sponding result valid for matrices? We would like to prove that if AB = /, 
then BA — /. In terms of matrix multiplication we would have to prove 
that if 

£ Qikblcj = &tji i f j — 1 , 2 , . . . , n, 

then 

E b tk a k j = djj, i } j = 1 , 2, . . . , n, 

k~ 1 

which i3 true hut by no means obvious. However, after the discussion of the 
following section we prove this result merely by pointing to Theorem 3.7. 
Anticipating this proof, we call B the inverse of A , denoted A~ l . Thus non- 
singular matrices are those which possess a multiplicative inverse. 

For the next two types of matrices considered here, we need the notion of 
the transpose of A, which is simply the matrix obtained by reflecting A 
across its main diagonal. 

Definition 4.7. Let A = (a tJ ) be any m X n matrix. The transpose of 
A } denoted A\ is the n X m matrix defined by 

A' = (M, 

where b T9 = a ar . 


Theorem 4.1. Let {ai, . . . , a m } be a basis for U m and {0 h . . . , 0 n } 
be a basis for W n ; let {fi, . . . , f m } and {gi, . . . , g ri } be the corresponding 
dual bases for V' m and respectively. If a linear transformation T 
from V m to 'Wn is represented relative to the a, 0 bases by a matrix A, 
then the transpose transformation T' from *W ' n to 'Uw is represented 
relative to the g, f bases by the transpose matrix A*. 

proof: From § 3.4 we recall that the dual basis for V f m is defined by 
otSk — for iy k = 1, 2, . . . , m; similarly, the dual basis for is de- 
fined by j3*g; = 6 k j for k, j = 1,2 , . . . , n. The transpose T' of the trans- 
formation T is the linear mapping from to V' m defined in § 3.5 by 

gjT = Tgy, j — \ } 2 y ... ,n. 

Relative to the g, f bases, T' is represented by the n X m matrix B = 
Q> }k )y determined by the equations 
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g,T = M*. j = 1, 2, . . . ,n. 

Thus, 

«>^T' = (a,T)g, = (£ a,*ft) g,- 

n n 

= 21 a *Jfc(£*gj) = 21 fliAj — Gi> 
ifc-l A-1 

However, 

a.0;T' = a, ( L = L bjk(otif k ) 

\*-l / *-1 

m 

= 2Z = &;». 

ifc-l 

Hence = a t „ so R = A'. 

In Exercise 7 you may prove that (A')' = A, (A + /?)' = A' + R', and 
(A B)' = R'A'. Notice in particular that the transpose of a product of matrices 
is the product of the transposes in the reverse order. 

Symmetric matrices . An n X n matrix A is said to be symmetric if and only 
if A = A'. Obviously any diagonal matrix is symmetric. 

Skew-symmetric matrices. An n X n matrix A is said to be skew-symmetric 
(or simply skew) if and only if A' = —A. This implies of course that if 
1 + 1 5^ 0 in the base field, then every diagonal element of a skew matrix is 
zero. We exclude from our consideration any field in which 1 + 1-0. 
(See § 1 . 6 .) 

Now let A be any square matrix. If 1 + 1 ^ 0 in OF, 

A = J(A + A') + \{A - A'). 

Since A + A' is symmetric and A — A' is skew (sec Exercise 8), this ex- 
presses A as a sum of two matrices, the first of which is symmetric and the 
second skew. 

Row vectors and column vectors . Relative to a fixed basis, every vector of 
has a unique representation as an n-tuple of scalars, (a lf . . . , a n ). Except 
for the presence of commas, this is formally the same as a matrix of one row 
and n columns. Accordingly, a 1 X n matrix is called a row vector. The trans- 
pose of a row vector is an n X 1 matrix, of n rows and one column, and is 
called a column vector. If A is a row vector, where A = (ai a 2 . . . <*«), and if 
B r is a column vector, where B * (&i 6 2 . . . b„), then both AR' and B r A are 
defined: 
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AB' = (at a 2 • • • a n ) ( | = (fli&i + 0262 + * * * 4* d n b n ) y 

l bn J 

which is a 1 X 1 matrix, or by Exercise 5, § 4.1, a scalar: 

^1 \ / 6i«i b x a 2 . . . bia n 


biQi btfli . . . b^CL n 


B'A = 1 * 1 (ai a* • • • a„) = 


1 b n o>i b n (i2 . . . b n o.n 


which is aim X n matrix. 


Exercises 

1. Prove that I A = A I = A for every square matrix A. 

2. Show that for each n the set of all n X n scalar matrices ever $ forms 
a field which is isomorphic to $. 

3. (i) Prove that all n X n diagonal matrices commute. 

(ii) Prove that if A commutes with all n X n diagonal matrices, then 
A is diagonal. 

4. Prove that the set of all n X n triangular matrices is closed under 
matrix sum and product. Deduce that the set of all nonsingular n X n 
triangular matrices forms a multiplicative group. 

5. Show that any 4X4 strictly triangular matrix is nilpolcnt. IIow 
would you generalize your proof for the n X n case? 

0. Prove that if A is idempotent and A 7 * /, then A is singular. 

7. Prove that 

(i) (Ay = .4, 

(ii) (A + BY = A’ + B\ 

(iii) (ABY = B'A* if AB is defined. 

8. Prove for every square matrix ,4 that 

(i) *4,4' is symmetric, 

(ii) A + A' is symmetric, 

(iii) A — A' is skew. 

9. Prove that ^l 2 is symmetric if either A is symmetric or *4 is skew. 

10. If A and B arc both symmetric, prove that 

(i) A + B is symmetric, 

(ii) *4J3 is symmetric if and only if ,4 and B commute. 
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11. If A and B are skew, prove that A + B is skew. 

12. Prove ( A')~ l = 04” 1 )' if A is nonsingular. 

13. Let {ai, a*, a 3 } be a basis for D 3 , and let 

ft = oti — 2<*2, 

Pi = Oil + Ct2 + Cl£3, 

ft = ct 2 “ as. 

(i) Prove {ft, ft, ft} is a basis, and express each a, as a linear com- 
bination of the ft. 

(ii) If T is defined by a»T = ft, find the matrix A which represents T 
relative to the a-basis. 

(iii) If S is defined by ftS = a,, find the matrix B which represents S 
relative to the 0-basis. 

(iv) Prove by matrix calculations that AB = /. 

§4.3. Fundamental Isomorphism Theorem 

We might expect that the next step in our development of the theory of 
matrices would be to establish various properties of matrix operations. This 
is correct, but instead of verifying such properties directly we prove a theorem 
which clarifies the connection between matrices and linear transformations. 
This connection is an isomorphism, and therefore we can obtain properties 
of matrices from theorems already proved about linear transformations. 
Not only does this method avoid duplication of effort, but it substitutes 
geometric insight for involved arithmetic calculations. 

From Theorem 3.10 we know that the set of all linear transformations on an 
n-dimensional vector space over 5 forms a linear algebra 

£ = {L, F; +, -, ©, O, □} 

ever tF. We wish to show that the set of all n X n matrices, together with 
the matrix operations, form a linear algebra 9TT which is isomorphic to £. 
Since a linear algebra is a vector space on which is defined a suitable multi- 
plication of vectors, the concept of isomorphism of linear algebras is defined 
to be a vector space isomorphism which also preserves the multiplication of 
vectors. 

Theorem 4.2. Let D be an n-dimensional vector space over The set 
of all n X n matrices over forms a linear algebra 9fld which is isomorphic 
to the linear algebra £ of all linear transformations on V. 

proof: Fix a basis {«i, . . . , a n } for V . Let A « (a*>) be any n X n 
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matrix over JT. With A we associate the linear transformation T of *0 
into V defined by 

n 

o f/T “ ^ q% jCXjf i = 1| ... , fi. 

j-i 

It was established in Theorem 3.17 that this correspondence is one-to- 
one, provided we regard a fixed ordering of the basis vectors. We next 
show that the correspondence preserves the three operations of scalar 
multiplication, matrix addition, and matrix multiplication. The matrix 
kA = (ka tJ ) determines the linear transformation Si defined by 

«»Si = 23 (ka XJ )<Xj = k 23 =ka x T = a t (A;T), 

j - l ;-l 

so if A corresponds to T, kA corresponds to &T. Let A = (a i} ) corre- 
spond to T, and let B = (b XJ ) correspond to U. Then A + B = (a X j + b XJ ) 
corresponds to the transformation S 2 defined by 

n n n 

Of»S 2 = 22 (flij “1” b t j)<Xj = 23 0> x jOCj ~h 23 b x jOLj 

J-l 3-1 3-1 

= ot x T + a,U = a»(T + U), 

so A + B corresponds to T + U. Finally, the matrix AB = (c»y), where 
= 53 *- 1 Qi/cbicj, corresponds to the transformation S 3 defined by 

n / n \ n n 

cti S 8 = 22 ( 52 cLtkbkj ) <*j = 53 53 

J-1 u-i / *-l ;-l 

= 52 a** ( 52 bkjotj ) 

*-l \;-l / 

= 23 aa(ajtU) = f L U 

*-i u-i / 

- a,TU. 

Hence i4R corresponds to TU, and the correspondence is an isomorphism. 
Theorem 3.16 showed that £ is a linear algebra; hence SHI, the set of all 
n X n matrices over forms a linear algebra (since 9TI and £ are iso- 
morphic), and the proof is complete. 

Recall that Theorem 3.16 stated that £ is of dimension n 2 , but we deferred 
the proof that the n 2 linear transformations T„ form a basis for £. T„ was 
defined by 

(#i, . . . , x n )T,y =* (0, . . . , 0, x X) 0, . . . , 0), 

where the jth component is Now the matrix corresponding to T tJ is the 
matrix E xi in which every element is zero except the element in the ith row 
and jth column, which is 1. It is clear that any matrix is a linear combination 
of the E i3i i,j = 1, . . . , n, and that the E %i are linearly independent (see Exer- 
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cise 4, § 4.1). Hence they form a basis for 9TC, and by the isomorphism the T„ 
form a basis for £. This completes the proof of Theorem 3.16. 

The argument of the preceding paragraph indicates the power and useful- 
ness of Theorem 4.2. To prove a result about linear transformations we may, 
if convenient, prove the corresponding result about matrices, and vice versa. 
In the next section we list a number of theorems about matrices which have 
already been proved in the language of linear transformations. 

However, we note that Theorem 4.2 concerns only n X n matrices; for the 
corresponding theorem for m X n matrices we recall from Theorem 3.17, 
that a linear transformation from V n to V? n is represented, relative to a pair 
of bases, by an m X n matrix. Also, from Theorem 3.1, the set of all linear 
transformations from *U m to forms a vector space. It is not difficult to 
verify that the dimension of this vector space is mn, and that the set of all 
m X n matrices forms a vector space isomorphic to it. 


Exercises 


I. Describe the linear transformation on S 2 which is represented by the 
scalar matrix al in case 


(i) a > 1, 

(ii) 0 < a < 1, 

(iii) a < 0. 


2. Describe the linear transformation on 8 2 which is represented by each 
of the following matrices : 


(i) 

( : .:> 

(ii) 

(-: -:> 

(iii) 

c ;> 

(iv) 

c ;> 

(v) 

(-i .;> 

3. (i) Determine all 

possible 2X2 real matrices which represent idem- 


potent linear transformations on S 2 . 

(ii) Determine all possible 2X2 real matrices which represent nil- 
potent linear transformations of index 2 on 6* 

4. Prove that the set of all m X n matrices over $ forms a vector space 
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of dimension mn which is isomorphic to the space of all linear transformations 
from D TO to W*. 


§4.4. Rank of a Matrix 

Referring now to § 4.1, we see that properties 1 through 7 of matrices all 
follow by the isomorphism theorem from the corresponding properties of 
linear transformations. (See Exercise 1, § 4.1, which asked for direct matrix 
proofs.) In the same way, other properties of matrices can be derived from 
the corresponding properties of linear transformations. Before doing this 
we introduce the notion of the rank of a matrix. 

First recall that the vector £ = ^LfLi can be represented relative to 
the a-basis as £ = (xj, . . . , x m ). This representation defines a 1 X m matrix 
(row vector) X = (xi • • • x m ). If A = (a t; ) is m X n, then the product XA 
is defined, and 

m 

XA = (yi • • • y n ), where y 3 = £ x l a ij . 

t-i 

Let rj = £5Li and let T be the transformation which corresponds to A , 
relative to the fixed bases {a u . . . , a m } and {ft, . . . , ft}. Then 

£T = ( £ x lQ f t ) T = Y, z»(«iT) 

V-i / t-i 

m / n \ n / m \ n 

= ( 111 a »jft ) = 5Z ( S %i a xj 1 ft = 51 2/jft 

t-1 \j-l / ;-l \i-l / ; - 1 

= *?• 

Hence t/ X represents the vector £ and A represents the transformation T, then 
XA represents the vector £T. This is a generalization of the observation that 
the image a>T of the jth vector of the basis is represented relative to the 
ftbasis by the row vector which is the jth row of the matrix A. 

Had we chosen left-hand notation for linear transformations, the situation 
would be different. (See the remarks following Definition 3.1 and those pre- 
ceding Definition 4.1.) Relative to the a, & bases the vector £ would be rep- 
resented by the column vector X T would be represented by the n X m 
matrix A ' ; T(a t ) would be represented by the ith column vector of A'; and 
T(£) would be represented by A'X' = (A r ^4)'. Thus one is able to pass from 
one notation to the other simply by taking transposes of appropriate matrices. 
The simplicity of representing T by the array of scalars as they stand in the 
arrangement (4.1) is one reason for preferring right-hand notation. 

Definition 4.9. The rank of a matrix A, denoted p(A), is the maximal 
number of linearly independent row vectors of A. 
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An immediate question is the following: What is the relation (if any) 
between the rank of a matrix and the rank of the linear transformation it 
represents relative to a chosen basis? Notice that p(A) as defined is an in- 
trinsic property of the matrix A and is independent of any bases we choose 
in order to represent A as a linear transformation. Likewise, the rank of a 
linear transformation is defined to be the dimension of its range space and 
thus is an intrinsic property of the transformation, independent of any choice 
of bases. 

Theorem 4.3. Let T be the transformation corresponding to the m X n 
matrix A relative to chosen bases. Then 

p(T) = p(A). 

proof : Let {a u . . . , a m ) be any basis for *U m . Any vector in (Rt is 
a linear combination of the row vectors of A, since if £ = Hf-iC,**,, 
{T = i c,(a,T). Hence any maximal independent subset of the row 
vectors of A is a basis for (Rt- 

Theorem 4.4. For any matrix A, p(A) = p(A'). 

proof : Given an m X n matrix A , choose vector spaces U m and V? n 
and a pair of bases, and let T be the linear mapping from V m to rep- 
resented by A relative to that pair of bases. By Theorem 4 . 1 , the trans- 
pose transformation T' from *W' n to V' m is represented by A' relative 
to the corresponding pair of dual bases. By Theorem 4 . 3 , we have 
p(T) = p(A) and p(T') = p(A'), and by Theorem 3 . 15 , p(T) = p(T'). 
Hence any matrix and its transpose have the same rank; this means 
that in any matrix the number of linearly independent row vectors equals 
the number of linearly independent column vectors. 

We conclude this section with a list of theorems, all of which are readily 
proved by the isomorphism theorem and a corresponding theorem for linear 
transformations. 

Theorem 4.5. Matrix multiplication is associative and bilinear. 

Theorem 4.6. If A and B are n X n matrices such that AB = /, then 
BA = I. 

Theorem 4.7. The following are equivalent for an n X n matrix A : 

(a) A is nonsingular, 

(b) p(A) = n, 

(c) The row vectors of A are linearly independent. 



90 Matrices 


[ch. 4] 


Theorem 4.8. If A is a nonsingular n X n matrix, B an m X n matrix, 
and C an n X p matrix, then p(BA) = p(B) and p(AC) = p(C). 

Theorem 4.9. Let A and B be n X n matrices. AB is nonsingular if 
and only if A and B are both nonsingular. If A B is nonsingular, then 
(AB)- 1 = Br'A-K 

Exercises 

1. Using the isomorphism theorem and appropriate theorems for linear 
transformations, prove Theorems 4. 5-4. 9. 

2. Prove directly by matrix calculations that 

A(bB + cC) = bAB + cAC 
for all n X n matrices A, B, C. 

3. Prove that a triangular matrix is nonsingular if and only if every 
diagonal element is different from zero. 

4. Let £ G V n , £ t * B. Prove that the set of all linear transformations T 
on V n such that £T = 6 forms a linear algebra whose dimension is n 2 — n. 

5. Let A, B be n X n matrices. What statements can you make about 
p(A + B) and p(AB)? 

6. State as a theorem for matrices the assertion of Theorem 3.6, state- 
ments (a), (6), and (c). 

7. State as a theorem for matrices the assertion of Exercise 7, § 3.2. 

8. An n X n Markov matrix is defined to be any n X n real matrix 
A = (a t j) which satisfies the two properties 

0 < a tJ < 1, 

53 a x} = 1 for i = 1, 2, . . . , n. 

Prove that the product of two Markov matrices is a Markov matrix. Do 
such matrices form a multiplicative group? 

9. A Markov (stochastic) matrix is called doubly stochastic if the sum of 
the elements in each column is unity. Is the product of two doubly stochastic 
matrices doubly stochastic? 

10. If M is a Markov matrix, show that the value of every element of 
column j of M 2 is between the values of the minimal and maximal elements 
of column j of M . 

11. In quantum mechanics the Pauli theory of electron spin makes use of 
linear transformations T*, T,,, T*, whose complex matrices in the preferred 
coordinate system are, respectively, 
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(i) Show that X 2 = Y 2 = Z 2 = /, and therefore that each is non- 
singular. 

(ii) Form a multiplication table of the four matrices /, X, Y, Z, and 
observe that any product of these matrices is a scalar times one of these 
matrices. 

(iii) List the elements of the smallest multiplicative group which con- 
tains X and Y. 

12. In the special theory of relativity, use is made of the Lorentz trans- 
formation, 

x' = b(x — vt) } 



where \v\ represents the speed of a moving object, c the speed of light, and 
b = c(c 2 — v 2 )~ 1/2 . The corresponding matrix is 

/ 1 — 1 >\ 

i. 

(i) Show that L(v) is nonsingular for |y| < c. 

(ii) Show that the set of all L(v) for \v\ < c forms a multiplicative 
group. This group is called the Lorentz group. 


§4.5. Block Multiplication of Matrices 

Although our emphasis in this book is on the theory of matrices rather 
than on the practical problems which arise in applications, it would be mis- 
leading to pretend that such problems do not exist. For example, the form 
of the product of two matrices may be unfamiliar to a beginner, but it is 
conceptually simple. In the product of an m X n matrix and an n X p 
matrix there are mp terms to be calculated, and each term requires n binary 
products and n — 1 sums. Hence, there are altogether mpn products and 
mp(n — 1) sums to be performed. For square matrices, this reduces to n 3 
products and n 3 — n 2 sums. 
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In matrices which arise from experimental work the individual entries are 
decimal numbers which are seldom integral, so that multiplication is consid- 
erably more tedious than addition. For this reason the amount of work 
required for a matrix calculation is usually expressed in terms of the number 
of multiplications involved. Since the product of two n X n matrices re- 
quires n a multiplications, it is clear that a tremendous amount of computation 
is required when n is large. Even for n = 10 the work is sufficiently long to 
discourage mental computation. The recent development of high-speed com- 
puters has reduced this problem considerably, and thereby has opened to 
solution by matrix methods many applied problems for which theoretical 
solutions were known but were computationally unfeasible. But even a large 
electronic computer has a limited storage space, and the practical question 
of computational technique remains. 

We now indicate a device, known as block multiplication of matrices, 
which can be used to decompose the product of two large matrices into nu- 
merous products of smaller matrices. Let A be m X n and B be n X p. Write 
n = ni + n 2 + • • • + n kf where each n t is a positive integer; partition the 
columns of A by putting the first n x columns in the first block, the next n 2 
columns in the second block, and so on. Partition the rows of B in exactly 
the same way. Then 

f B±' 

Bt 

A - {Ai\A*\---\Ak), B = ' 

iBkj 

where A t is the m X n» matrix consisting of columns of A beginning with 
column ni + • * • + n,_ i + 1 and ending with column n x + • ■ • + n„ and 
where Bj is the n ; X p matrix consisting of rows of B beginning with row 
n x + • - • + + 1 and ending with row n x + • • ■ + n,. Then the method 

of block multiplication asserts that 

AB — A x Bi + A 2 B 2 + • • • + AfcjBjfc. 

More generally, suppose that having partitioned the columns of A and 
the rows of B as described above, we partition the rows of A in any manner 
and the columns of B in any manner. We obtain 


An 

A 12 . . 


/Bn 

Bn. 

■ B, 

A 21 

A22 . . 

. A ik \ 

I B n 

B n . 

. ■ B, 


. 

: ) 

b - ; 

. 

. 

An 

An. , 

ArJ 

\B kl 

Bn.. 

. B, 



[ § 4.5 ] 


Block Multiplication of Matrices 93 


where A xi is a matrix (rectangular array) having r, rows and n t columns and 
B tJ is a matrix having n< rows and columns. Then for fixed t, j the product 
AttBij is defined and yields an r, X s 3 matrix; therefore ]£?- 1 -<4 is an 
r, X Sj matrix. The method of block multiplication asserts that 


f (?12 . . . C U \ 



where C t j = i A lt B tj . 

This is in the same form as the element-by-element definition of the product 
of matrices in which each element is considered as a l X 1 block. The im- 
portant thing to remember in block multiplication of AB is that the column 
partition of A must coincide with the row partition of B , in order that all the 
matrix products A 1t B tJ be defined. Since matrix multiplication is noncom- 
mutative, it is essential that the proper order be maintained in forming 
products of blocks. 

The proof of this result is not difficult, but it does require care in choosing 
notation and manipulating indices. Since we do not require the result for 
the development of theory, a general proof is omitted. To understand the 
application of block multiplication to the problem of large-scale computations, 
consider two 50 X 50 matrices. There are 2500 elements in each matrix; 
the multiplication of two such matrices requires 125,000 multiplications and 
almost as many additions. One method of performing such calculations on a 
computer whose storage capacity is exceeded by the magnitude of the prob- 
lem would be to partition each of the matrices into smaller matrices, perhaps 
into four 25 X 25 blocks, 



If the blocks are suitably small, the machine can successively compute the 
products A u Bn, A n Bi 2f A X2 B 2U A X2 B 22} and so on, record the results on punched 
cards to clear the machine storage for the next block of calculations, and 
finally compute C n — AnBn + A\ 2 B 2Y} and so on. 

Even for human calculators, block multiplication is useful in case special 
patterns appear in the matrix. For example, let 
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and let 



where B 0 has two rows and C 0 has three rows. Then 



Hence the only nontrivial computation required for AB is the product of 
the 2X3 matrix A 0 with the 3 X p matrix C 0 . 

To observe that it is sometimes possible to introduce convenient patterns 
in a matrix by judicious selection of bases, consider a linear transformation T 
from V m to *W„. Choose a basis for 9T>r and extend it to a basis {<*i, . . . , a m } 
for *l) m , numbered so that the basis {a m _„ + 1 , . . . , a m } for 3 ^t appears last. 
Now in choose any basis {0i, . . . , 0 W _„} for (Rt and extend it to a basis 
{0i, . . . , 0„} for *W„. Then relative to these bases T is represented by an 
m X n matrix A of the block form 



where B is (m — v) X (m — v), and Z i, Z 2 , Z 3 are zero matrices of dimensions 
(m — */) X (n — m + v), v X (m — v ), and v X (n — m + v), respectively. 

In particular, T is nonsingular if and only if v = 0. In that case we have 

A = (BIZ,). 

The m X m matrix B is nonsingular since its m rows represent the linearly 
independent vectors a t T, i = 1 f ... ,m. Let C be the n X m matrix defined by 



where Z 4 is the (n — m) X m zero matrix. Then we compute 


AC = / m , 



These calculations should help to make clear the distinction between our 
definition of nonsingular linear mappings and that of nonsingular matrices. 
If T is a nonsingular linear transformation from *U m to V? n , then m < n, 
and T can be represented by a rectangular matrix consisting of an m X rrt 
block which is a nonsingular matrix and an m X (n — m) block of zeros. 

Exercises 

1. Prove the first of the two assertions of the text concerning block mul- 
tiplication: If A = (.4i| • • • |At) and 
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Bi 
B - 

Bk 

then AB — AiBi +•••-(- AkBk. ■ 

2. Calculate AB in three ways: directly without partition; with the parti- 
tion indicated; with a different partition of your own choosing. 


2 

3 

4 

0 

0 

3 

1 

0 

0 

0 

1 

0 

1 

0 

4 

-1 

0 

0 

1 

0 

0 

-1 

4 

0 

1 


1 

0 

0' 

3 

0 

0 

0 

2 

1 

-1 

0 

0 

-1 

0 

0 


3. Suppose an n X n matrix A is of the form 



where Z is a k X (n - k) block of zeros. 

(i) Consider the linear transformation T determined by A relative to 
a chosen basis {on, . . . , <*„}. What is the geometric meaning of the block of 
zeros? 

(ii) Suppose an n X n matrix B is also of the form described above 
for the matrix A. Prove by block multiplication that AB has this same 
property. 

(iii) Prove the result of (ii) by a geometric argument. 



CHAPTER 5 


Linear Equations 
and Determinants 


§ 5 . 1 . Systems of Linear Equations 

One of the most frequent applications of matrices to modern science arises 
from the need to solve a system of linear equations: 

Oil Xi + OiJ Xj + • • • + ai n X„ = Vi 

aji Xi + a a Xi + ••• + a u x n = y% 

{5.1) ' ' 

• • • • 

^ml^l “|“ H“ * * * = 2/m» 

Here we consider the vm scalars a tJ and the m scalars y t as fixed. By a solu- 
tion of the system {5.1) we mean an n-tuple of scalars Xj, j = 1, . . . , w, for 
which each of the m equations is satisfied. To solve the system means to 
find all solutions. 

The system can be written in compact form by using matrix notation; let 



Then the system is represented by the single matrix equation, 
{5.2) AX = Y. 

By taking the transpose of each side we obtain 
{5.8) X'A' « Y\ 
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which is in the form we have adopted for linear transformations: X r is a row 
vector of n components, A' an n X m matrix, and Y* a row vector of m compo- 
nents. Therefore, if w'e choose {fa, as a basis for and {ati, . /. , a*,} 

as a basis for V m , then A' represents a linear transformation T from *W n to U m , 
X ' a vector { in W n , and Y* a vector rj in *U m : 

(<M) fT = rj. 

In this section we shall develop the theory concerned with the existence 
and uniqueness of solutions, deferring several observations concerning specific 
methods of obtaining solutions until further properties of matrices are estab- 
lished. We remark first about notation; our choice of right-hand notation 
for linear transformations has led to the necessity of considering transposes 
of the natural arrangement of the scalars of a system of linear equations, in 
order to reduce the system to right-hand matrix notation. Thus, while right- 
hand notation seems preferable for the matrix representation of linear trans- 
formations, left-hand notation is more natural for the matrix representation 
of systems of linear equations. This intrinsic difference is the underlying 
reason for a lack of uniformity in notation for matrix representations. Since 
the passage from either notation to the other is easily performed by means of 
transposes, no real difficulty is encountered in consulting various references, 
provided we remember to ascertain which notation is adopted in each case. 

To return to the problem of solving a system of linear equations, we have 
a given linear transformation T from W n to U m , a given vector r\ G V m , and 
we seek to find all { G which are mapped by T into 17 : 

eT-* 

In solving this problem we shall consider separately the cases 17 = 6 and 
17 7 ^ 6; alsb, some of our conclusions will depend upon the relative magnitudes 
of the three positive integers m, n, p(A ), where {5.1) is a system of m equations 
in n unknowns whose matrix of coefficients has rank p{A). Since p{A) = 
p(A') = p(T), p(A) cannot exceed either m or n. 

The homogeneous case. If 17 = 0 , or equivalently t/i = y* = • • • =* y m * 0, 
the system is said to be homogeneous. In this case the set of all solutions f is 
simply the null space 9Q/r, whose dimension is v(T) = n — p(T) = n — 
p{A) > 0. The zero vector 6 is always a solution, called the trivial solution. 
There will exist nontrivial (nonzero) solutions if and only if i/(T) > 0. This 
simple geometric argument has provided a full description of the solution of 
a homogeneous system of linear equations. 

Theorem 5.1. If y\ — yz * • ■ • = 2/m = 0 , the solutions of {5.1) form 
a vector space of dimension n — p{A). Nontrivial solutions exist if and 
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only if n — p(A) > 0. Thus, if m = n = p(A) t the trivial solution is 
unique. 

For n = 3, the geometric interpretation is that either X\ = x 2 = x 3 =* 0 is 
the only solution, or that every point on a certain line through the origin is 
a solution, or that every point on a certain plane through the origin is a solu- 
tion. (The case p(A) = 0 is not considered, since then a t , = 0 for all i, j.) 

The nonhomogeneous case. If rj 9 * 0 (or, equivalently, if some y } ^ 0), the 
system is said to be nonhomogeneous. In this case the solution is the set of 
all vectors £ which are mapped by T into 17 . Since T and 77 are fixed by the 
given scalars a %} and y }1 there is no assurance that even one solution exists. 
Clearly, a solution will exist if and only if 77 e (Rt. But (Rt is spanned by 
the rows of A' f which are the columns of A. Hence, a solution exists if and 
only if the column vector Y is a linear combination of the columns of A. Let 
us form a new matrix A Y) called the augmented matrix of the system (5.1) f by 
adjoining the column vector Y to the matrix A : in partitioned form, 

Ay = (A\Y). 

By our previous remark, a solution to (5.1) exists if and only if Y is a linear 
combination of the columns of A. Taking transposes, Y' must then be a 
linear combination of the rows of A', so 

p(Ay) = p(A'y) - p(A') = P (A). 

We have proved the following result. 

Theorem 5.2. A solution of (5.1) exists if and only if p(Ay) = p(A), 
where A y is the augmented matrix 



Our next theorem describes the set of all solutions of the nonhomogeneous 
system. 

Theorem 5.3. If £ 0 is a solution of the nonhomogeneous system (5.4), 
then £ is a solution if and only if 

{ = £0 + v for some v e 91 t. 

proof: Let £ and £ 0 be solutions. Then £T = 17 = £ 0 T. Hence £T — 
£oT = (£ — £ 0 )T = q — 77 = 0 , so £ — £0 e 3lx. Conversely, if 
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£ = £o + v for some v e SJIt, 

then 

£T = (£o + *>)T * £ 0 T + vT = 7) + e = ri, 

so £ is a solution. 

The geometric meaning of this theorem is interesting. There may be no 
solutions to (5.4). If one solution f 0 exists, then the set of all solutions is a 
translation by £ 0 of the subspace STIt of all solutions of the associated homoge- 
neous system. Hence, the solution set for n = 3 is void, a single point P, 
a line through P, or a plane through P. These are not subspaces in the non- 
homogeneous case, because then P is not the origin. 

It is appropriate to comment further about Theorem 5.2. The condition 
that the rank of A equal the rank of the augmented matrix A y is called the 
consistency condition, and a system which satisfies this condition is said to be 
consistent. Thus a consistent system is simply one which has a solution. This 
is equivalent to saying that any linear dependence of the rows of A produces 
an identical dependence of the components of Y. More precisely, if y t denotes 
the 2 th row of A, 

m m 

L Cih = 0 only if £ c#, = 0. 

i-i i-i 

We conclude with a uniqueness theorem which is valid for both homoge- 
neous and nonhomogeneous systems. 

Theorem 5.4. If m = n = p(A), there is a unique solution of (5.1). 

proof: By hypothesis, A is a square matrix which is nonsinguiar. 

Clearly, X Q = A~ l Y is a solution to (5.2), and for any solution X, 

AX = F, so X = A~ l Y = Xq. 

Exercises 

1. Find a necessary and sufficient condition on p(A) that the system (5.2) 
have a solution for all possible choices of Y. Prove your result. 

2. (i) Describe geometrically the solutions of the single equation 

a% \X\ H" X 12 X 2 “h == 2 / *• 

What if yi = 0? 

(ii) Describe geometrically the solutions of a system of two equations 
(i a* 1, 2) of the type in (i). Need solutions exist? Discuss fully. 

(iii) In the nonhomogeneous case with m = n =* 3, discuss the geo- 
metric meaning of p(A) = 1, 2, 3, including in your discussion both consist- 
ency and nonconsistency for each value of p(A). 
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3. Solve the system 

Xi — x 2 + x z — x 4 + x b = 1, 

2xi — x 2 + 3j 3 + = 2, 

3xi — 2x 2 + 2£ 3 + x 4 + x 6 = 1, 

Zl -f X 3 -+■ ZXi -t- £5 

4. Solve the system 

Xi + 2x 2 + x z 

6£i + £2 + £3 

2£i — 3£ 2 — £3 
— £1 — 7 £ 2 — 2£ 3 

£1 - £2 

5. Solve the system 

2£i + £2 + 5£ 3 = 4, 

3£i — 2£ 2 + 2£ 3 = 2 , 

5xi — 8 j 2 — 4£ 3 = 1. 

§ 5 . 2 . Determinants 

It is quite likely that you have encountered determinants in your pre- 
vious study of the solution of a system of n linear equations in n unknowns, 
particularly for the cases n = 2, 3. If so, your estimate of their efficiency as a 
computational device may be unrealistically high, for while determinants are 
manageable enough for low values of n, they become quite unwieldy as n in- 
creases. Since more economical methods of solving linear equations are 
available, determinants actually have little value as a general technique 
computation. However, they do possess definite value as a theoretical tool, 
and for this reason we include a self-contained exposition of the basic prop- 
erties of a determinant. 

Our point of departure may appear at first to be outrageously abstract, 
but we shall soon see that this abstraction pays handsome dividends in the 
simplicity of the proofs ,of the properties of n X n determinants. This will be 
especially apparent to anyone who has worked through an inductive definition 
of determinants. 

In formulating any definition abstractly we usually are guided by some 
knowledge of a special system which we wish to generalize. Here it suffices 
to consider the determinant of a 2 X 2 matrix, 



a b 
I c d 


ad — be. 
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First we recognize that a 2 X 2 determinant associates a field element with 
each 2X2 matrix, so this determinant is a function whose domain is the set 
of all 2 X 2 matrices over a field and whose range is a subset of the field. 
This function has many properties: of these we mention four, which are easily 
verified. 


1 . 


a 

c 


3. 


a 

c 


4. 


1 

0 


= k 


kb 
kd 

b + e 
'/ + / 


= 0. 


b 

d' 

a 
e 



0 

1 


= 1 


c 

f 


Since all of these properties, except the last, are assertions about columns, 
we shall agree to write A , for the it h column vector of an n X w matrix; also, 
to make clear the separation between columns, we insert commas: 




We are now ready to give an axiomatic definition of determinant. 


Definition 5.1. A function “dot” whose domain is the set of all n X n 
matrices over 3 and whose range is a subset of fF is called a determinant , 
provided det satisfies three conditions: 

(a) det is a linear function of each column; that is, for any k = 1, 2, . . . , n 
and all 6, c G $, if A k = bB k + cC k , then 

det (A h . . . , bB k + cC h . . . , A n ) 

= b det (Aj, . . . , B k , . . . , A n ) + c dct(Ai, . . . , C k} . . . , A„); 

(b) if two adjacent columns of A are equal, det A — 0; 

(c) det / = 1, where I is the identity matrix and 1 is the unity element 
of fF. 


Notice that (a) combines the first two properties listed for the 2X2 
example. It is a remarkable fact that we are able to derive most of the es- 
sential properties of determinants from the first and second axioms of Defi- 
nition 5.1. The third axiom is a normalizing assumption which guarantees 
that det is uniquely defined, for we must prove that such a function is unique, 
and even that a function with these properties exists. Three problems are 
of immediate concern: to prove that det exists, to prove that det is uniquely 
determined, and to derive properties of det. We shall consider these problems 
in the reverse order. 
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Theorem 5.5. If det is a function with properties (a) and (b) of Defi- 
nition 5.1, then 

(a) det (A ly . . . , cA k , . . . , A n ) = c det(Aj, . . . , A k , . . . , A*), 

(b) det(Aj, + C k , ... , A n ) 

= det(Aj, . . . , B a, . . . , + det(Ai, . . . , C*, . . . , A n ), 

(c) if A k = 0, then dot A = 0, 

(d) det. (A i, . . . , A h . . . , A n ) = det(Ai f . . . , A k + cA k + h . . . , A,), 

(e) det (A,, . . . ,A k , A H1 ,. . . ,A„) = -det (A,,. . . , A* +lf A*, . . . ,A„), 

(f) if A ; = A k for any j ^ /c, then det A = 0, 

GO det A = det(Ai, . . . , A k + cA Jt \ n ) for any j ^ k, 

(h) det(Ai, . . . , A t , . . . , A Jf . . . , A*) 

det (A j, . . • j A j, . . . , Aj, , . . , A n ). 

phoo f : Before proving each statement, we translate it into words to 

emphasize its meaning. 

(a) A common factor of each clement of a fixed column may be factored out 
as a multiplicative constant. Let 6 = 0 in Definition 5.1 (a). 

(b) If a fixed column of A is written as the sum of two column vectors , the 
determinant of A is the sum of the two determinants as indicated. Let 
b — c = 1 in Definition 5.1 (a). 

(c.) If any column of A consists entirely of zeros , then det A = 0. Use 
(a) to factor out 0. 

(d) Any scalar multiple of a column may be added to an adjacent column 
without changing the value of the determinant. Use (b) to expand the 
altered determinant into the sum of two determinants. One of these 
two is det A ; the other is det(Ai, . . . , cA k ± h Aa+ i, . . . , A n ) = 0, 
since c factors out and then Definition 5.1 (b) may be applied. 

(e) If two adjacent columns are interchanged , the value of the determinant 
merely changes sign . First add A k+i to A k , using (d); then subtract 
the new fcth column from A*+i. This gives 

det A = det (A i, . . . . , A k + A k+h - A*, . . . , A„) 

= det (A,, . . . , A a 4 -x i - A a, . . . , A n ) 

= —det (A lf . . . , A* + i, A a, . . . , An). 

The interchange of two adjacent columns is called a transposition . 

(f) If any two columns are equal, the determinant is zero. Use (e) repeat- 
edly to bring the equal columns into adjacent position, changing the 
sign of the determinant at each transposition. Then apply Defini- 
tion 5.1 (b). 

(g) Any constant ftiultiple of a column may be added to any other column 
without changing the value of the determinant. Transpose one of the 
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two columns repeatedly until it is adjacent to the other, and apply 
(d). Then transpose the moving column back to its original posi- 
tion. The number of transpositions needed to do all of this is even, 
so the result follows from (e). 

(h) If any two columns are interchanged , the determinant merely changes 
sign . Transpose A } repeatedly until it replaces A *. If this requires 
p transpositions, then A k can be moved from its new position (adja- 
cent to its old one) to the original position of .1^ in p — 1 transposi- 
tions. The interchange can be accomplished in an odd number of 
adjacent interchanges, and (e) may be applied. 

Exercises 

1. Consider Definition 5.1 for n = 2. Let. A , R be any 2X2 matrices. 

(i) Calculate det(/?.l), using only the properties proved in Theorem 
5.5, to obtain an answer in the form k det B , for some scalar k which is a com- 
bination of the entries of A. 

(ii) Specialize the result in (i) to the case B = /, thus showing that 
the specific form stated in the text for 2 X 2 mat rices is actually a consequence 
of Definition 5.1. 

2. Show that in S 2 the absolute value of det A is the area of the parallelo- 
gram determined by the row vectors of A. 

3. If not all of a, b, c, d arc zero, consider the system of equations 

ax + by — c, 
cx + dy = f. 

(i) Express the consistency condition in determinant form. 

(ii) Express the solution, assuming existence and uniqueness, in deter- 
minant form. 

§5.3. An Explicit Form for det A 

We continue with the program declared in the last section — to investigate 
properties of det, and particularly to show that such a function exists and is 
uniquely defined by Definition 5.1. The first step is to obtain an explicit 
form for det A. For any n X n matrix B let C = BA; then the Ath column 
of C is given by C* * B } a jky where Bj is the jth column of B. Hence 

det C = det(C], det ( £ . . . , £ BjOj A 

\;-i * / 
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where each index of summation runs independently of the others. Each col- 
umn is the sum of n columns, and we may use Definition 5.1 (a) on each col- 
umn in succession to expand det C to a sum of n n determinants: 

del C — £ det (Bjyfifri) Bj/ij* 2 , . . . , B jn (ij^ n ), 

where the summation is extended over all possible values of the indices, each 
running from 1 to n. By Theorem 5.5 (f), the only nonzero determinants of 
this sum arc the ones in which j h . . . , j n are all different — in other words, 
the subscripts of the various B’s form a permutation of 1, . . . , n. Hence 

det C = £ det (Z?p(i)flp(nn • • • » B P ( n) a P ( n )„) 

— [^p(l)l * * ' (1p(n)n dct(/? ; i(i), . . . , B p ( r< ) ) 1 j 

since, for each j, n p0); factors out of the jih column. Here the summation is 
extended over all permutations p of 1 , . . . , n. It is well known (see References, 

1 or 2) that each permutation p can be classified as even or odd according to 
whether p can be represented as a product of an even or an odd number of 
transpositions. But each transposition of columns of B produces a change in 
the sign of det B. Hence 

det (/?|,(D, . . . , B P ( fl) ) = zb det /i, 

where the + sign is used if p is an even permutation and the — sign is used 
if p is odd. Then we have 

( 0 . 0 ) det C = det B ■ £ [=ba,, (1 )i • • • a p(n u]. 

all p 

Now for the first time we use property (c) of Definition 5.1. Since equation (J.J) 
is true for all n X n matrices A and R, wo may specify B = I. Then C = 
I A = A y and det B = det / = 1, yielding an expression for det A in terms of 
the elements of A . 

We have proved several important results, which we now state explicitly. 

Theorem 5.6, If a function det exists with the properties of Definition 
5.1, then for every square matrix .1 , 

det A = X d= [ap(i)in /; (o )2 • • • g P ( W ) B ], 
p 

where the sum is extended over all permutations p of the integers 
1, 2, ... f ti and where a + or — sign is affixed to each product according 
to whether p is even or odd. 

Thus det A is an algebraic sum of all products of n terms which can be 
formed by selecting exactly one term from each row and each column of A. 

Theorem 5.7. If A * is the transpose of A, then 

det A 9 = det A. 
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Theorem 5.7 follows from the representation of det A as a sum of signed 
products of elements of A, each product containing exactly one element from 
each row and each column. Hence in Theorem 5.5 the statements about 
columns are valid for rows. 

Theorem 5.8. det(.4£) = fdet 4) (det B) = det(BA). 
proof : Exercise. 


Theorem 5.6 is a uniqueness theorem because it states that any function 
det which satisfies Definition 5.1 must assign to A a value det A which is 
completely described by the elements of A. But we have not yet proved 
that a function exists which has the properties assumed for det. One way to 
settle the existence question is to verify that the specific function described 
by Theorem 5.6 satisfies the three properties of Definition 5.1. Such a proof 
is possible, but an alternative method is chosen here. 

To prove that det exists for every n, we proceed by induction. For n = 1, 
A = (a), and we let det A = a. This function trivially satisfies Definition 5.1. 
Now assume that such a function exists for square matrices of dimension 
n — 1. For an n X n matrix A } define 


det A = £ a tJ \Aij\, 

j-i 

where i is any fixed value 1, 2, . . . , n, and \A tJ \ is the (n — 1) X (n — 1) de- 
terminant obtained by deleting the t’th row and the jth column of A and 
affixing the sign (— l) l +'. Thus, 

f flu • • .ERJ!. . . Uln^ 


\A (i \ = (— l) i+ »del 


'' *•"** • 


i a,i ***. # g ij » -t ^ o>i n • 

>.*** .. '.*■ . , , . > . , . 

r' 

- • . dn 


We next verify that det has the three properties of Definition 5.1. 
(a) First, suppose .4 a = bB k + cC k . Then 


Let 

and 


A = (A u . . 

• j Ai f . . 

-,A n ). 

B = (Aj, . . 

. , bB k , . 

..,A n ) 

C - (A u . . 

■ , cC k , . . 

• f An) f 


so that B and C coincide with A except in the kth column. If j = fc, 
UiJt|^4»7c| = (66,* ”1" cc,*) | A ,*| = 66i*Jfi,*| “I” cCja|( 7 *aJ, 
since \A ik \ = \B ik \ = |C*|. If j * K 
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fan . Jdijl. . On • • flu 1 

Bk- • 


\A„\ = (— 1) <+# det I 


* ‘t’,' i ‘ 

• -34 * " ank • # ann ^ 


where rvu = Mu + ecu • By the induction hypothesis, an (n — 1) X (n — 1) 
determinant is a linear function of any column, so for j ^ k 

a%j\A , ; | = a t j(b\B tJ \ + c\C tJ \) = + cc lJ \C IJ l 

Finally, 

dot 4 = E = E (M j; |# i; | + cc„|f;„|) = 6 det /) + cdetC. 

J-1 ;-l 

(h) Next, suppose <4* = .4 4f , for some A:. Ifj 5^ A:, 7 ^ /c + l,then|*l„l = 0, 
since two adjacent columns of this determinant are equal. Hence 
det A = a lA |.4, A | + a ltk fl |.t 1>A +,| = 0, 

since a,* = a xMX and \A xk \ = — 1^1 liA t i|- 
(c) Finally, if A = /„, 

det /„ = E a l; |.4 i; | = \A U \ = det / n _i = 1. 

Thus a determinant function exists for all n. 

While an existence proof is necessary for logical completeness, the fact 
that det exists does not surprise us. However, in the proof we have estab- 
lished a useful method of evaluating determinants. In the notation used 
above, \A tJ \ is called the cofactor of a iJt and the equation 


det A = E (*tj\A tJ \ for fixed i 

is the rule for expanding det A according to the elements of the ith row. The 
corresponding result, 

det A = E a»\A u \ for fixed j 

1-1 

holds for columns. These results are summarized below. 


Definition 5.2. The cofactor I/1J of a XJ in det A is ( -I) l+; times the de- 
terminant of the matrix obtained by deleting the ith row and jth col- 
umn of .4. 


Theorem 5.9. 


(a) 

det A = £ aj.4 J 
j- 1 

n 

for fixed i. 

(b) 

det A = £ a w |.4„| 

l-l 

for fixed j . 
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Theorem 5.10. 

5i*det A — a XJ \Akj\ t 

where is the Kronecker delta. 
proof : Exercise. 


The following example illustrates some of the properties discussed above. 
The common notation which replaces dot *4 by |.1| is employed here. 


3 

1 

— 2 

4 


3 

1 

__ 2 

4 

2 

0 

-5 

1 


2 

0 

-5 

1 

1 

-1 

2 

0 

Add Hi 

4 

0 

0 

10 

-2 

3 

-2 

3 

to U 3 

-2 

3 

— 2 

3 


3 

2 


Add -3R, 
to R 4 



1 -2 
0 —5 
0 0 
0 4 


4 

1 

10 

-9 


(-D 

Expand by 
elements of C* 



-5 1 

0 10 
4 -91 


0 0 


(-D 

Add — 2C* to Ci 
and 5C 3 to C 2 



30 

-41 


1 

10 


(-1) 


4 


Expand by 
elements of Iij 



50 

-41 


= — [(— J0)(— 41) - (50) (7)] = -306. 


Exercises 

1. Prove Theorem 5.8. 

2. Prove Theorem 5.10. 

3. Illustrate Theorem 5.10 by calculating YLl-i a v\^tcj\ for i = 1, k =» 2, 
and for i = 2, k = 2, given 



4. By applying the remark which immediately follows Theorem 5.6, 
show that if an n X n matrix can be partitioned in the form 



where E is a square matrix and Z consists entirely of zeros, then det A 
(det £)(det H). 
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5. (i) How many terms are involved in the representation of det A by 
the method of Theorem 5.6? 

(ii) How many multiplications are required to evaluate det A by the 
method of Theorem 5.6? 

(iii) How many multiplications are required to evaluate det .4 by the 
method of Theorem 5.9? 

6. The Vandermonde matrix of order n is, by definition, 

... 1 
. . . x n 


Vr ?- 1 Jr 1 . . . xJT 1 ' 

(i) For n = 2, 3 verify that det V = n (x, — x l ) 1 where II de- 

1 <3 <n 

notes “product.” 

(ii) Prove this statement for all n > 1. 


V(X U . . . , X U ) = 1 


n 

Xi 

x'i 


Xi 

xi 


§ 5 . 4 . The Inverse of a Matrix: Adjoint Method 

One of the most useful properties of the determinant function is that it 
provides a simple characterization of nonsingular matrices. Furthermore, 
the cofactors of a nonsingular matrix can be used to calculate its inverse by 
a method which is inefficient for large n, hut quite easy for n < 4. Other 
methods for computing A -1 are described in the next chapter. 

Theorem 5.11. A is nonsingular if and only if det. A y* 0. If A is non- 
singular, detG't'” 1 ) = (dot A) -1 . 

p it o o f : If A is singular, then its rows are linearly dependent, by The- 
orem 4.7. Using the results of Theorem 5.5 for rows instead of columns, 
we can obtain a row of zeros in a determinant whose value is det A. 
Hence det A = 0. Conversely, if A is nonsingular, A -1 exists and 

det A det A” 1 = det (AA~*) = det 7 = 1, 
so det Ay* 0. 

The first method we describe for calculating A -1 is called the adjoint 
method. We now define the term adjoint. 

Definition 5.3. The adjoint of an n X n matrix A = (a„) is the n X n 
matrix adj A = (a?,), where aj = |A„| = cofactor of a Jt in det A. 
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We note m particular two things about the adjoint: first, the elements of 
adj A are determinants formed from A f each with a suitable sign attached 
(see Definition 5.2); second, the element in the (i, j) position of adj A is the 
cofactor of the element in the (j, i) position of .4. 

Theorem 5.12. If A is nonsingular, then 

/t' 1 = [det .4]~ l adj A. 

proof: We calculate A adj A = where 

ri n 

K = £ "ikatj = £ (ia\A jk \ 
i fc* i 

= 8,j det A by Theorem 5.10. 

Hence A adj A = (5„ det .1) = (det .4)/, from which the theorem follows. 


Example 

Let 



Then det .1 = 8. To compute ciu we find the cofactor of an: 


I A 2i | = (— 1) 


3 

o 


= o. 


Similar computations yield 

/-2 G 4\ 

adj A = I 1—3 2 J, 

\ 1 5 2/ 

and therefore 



It is simpler of course to leave £ factored in front of adj A . 

Now let us return to the problem of solving a system of n linear equations 
in n unknowns. In the notalion of § 5.1, AX = Y, where A is an n X n 
matrix which we here assume to be nonsingular. The unique solution is 
given by the column vector, 


X = A~ l Y = (det yl)- 1 (adj A)Y 

Hence 

Xi - (det A)- 1 Z a uVi = ( det A )~ l L \Aj>\Vi • 

' J-l 3-1 

The terms under the summation sign are easily seen to be the terms of the 
expansion by elements of the tth column of the determinant 
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1 0n ... y\ 0i,i+i . . . 0in ' 


det Aw) = det 


l 0nl • • • 0n,t-l y n 0n,t+l . . . 0 n 


where A no agrees with A except in the tth column, where Y has replaced A*. 
This result is known as Cramer’s rule. 


Theorem 5.13. If the determinant of coefficients of the system of 
linear equations 

011*^1 "f* ‘ * * “f* 01n^n = y\ 


dn\X\ + • • • + a nn x n = y n 

is not zero, then the unique solution is given by 

_ det(Ai, . . . , A,-i, T, A, 41, . . . , A„) 

1 det(Aj, . . . , Am, A t , A t+1 , . . . , A n )’ 


i = 1, 2, . . . , n. 


The amount of work involved in the solution of a linear system by Cramer’s 
rule is the same as in solving by calculating A -1 by the adjoint method. 
Both methods are unnecessarily cumbersome for large n, and for small values 
of n the method of direct algebraic elimination is often simpler than either 
the adjoint method or Cramer’s rule. 

To underline the practical importance of these remarks, consider the prob- 
lem of solving a system of 25 linear equations in 25 unknowns. Cramer’s 
rule requires the evaluation of 26 determinants of order 25; if either Theo- 
rem 5.6 or Theorem 5.9 is used to evaluate these determinants, more than 26! 
multiplications are required, a number of the order 10 26 . A computer which 
performs 1000 multiplications per second would require 10 16 years for the 
calculation. However, by making use of other computational techniques, 
systems of more than 2000 equations have been solved recently. In the next 
section a method is described in which a system of n equations can be solved 
by means of only n 3 multiplications, and n 3 < n\ whenever n > 5. 


Exercises 

1. Solve the following system of equations, 

•2x! — Xt + 3xj = 3, 

Xt = - 2 , 

2lj + Xt + Ti — 1, 
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(i) by the adjoint method of calculating A*" 1 , 

(ii) by Cramer’s rule, 

(iii) by direct algebraic elimination. 

2. Prove that every square skew-symmetric matrix of odd dimension is 
singular. 

3. In S 3 , let A(ai, a 2 , a 3 ), B(b h & 2 , M, C(c h c 2 , c 3 ) be three points, not all on 
the same line. Prove that an equation for the plane determined by A, B , and 
C is 


Xi 

X2 

X 3 

1 

ai 

<7 2 

<*3 

1 


b. 

bz 

1 

Cl 

c 2 

c 3 

1 


4. Prove that det(adj A) = (det A)"" 1 . 

5. For what values of x is the following 

/3 - x 2 
( 1 4 — x 

\— 2 —4 

6. Show that A is nonsingular, where 

1° l 1 l"- 1 

2 ° 2 1 2 n ~ 1 


(n — 1)° (n — l) 1 ... (n — 

7. Show that the determinant of a triangular matrix is the product of its 
diagonal elements. Use this fact to solve Exercise 3, § 4.4. 

8. Let A be a singular n X n matrix. 

(i) Prove that if n = 1, 2, then A 2 is proportional to A. 

(ii) Show that A 2 need not be proportional to A whenever n > 2. 

(iii) How are these results related to Exercise 7, § 3.2? 

9. Read “Solving linear equations can be interesting,” by G. E. Forsythe, 
Bulletin of the American Mathematical Society , vol. 59 (1953), pp. 299-329. 



matrix singular? 



§ 5 . 5 . Operations on Linear Systems 

In spite of its apparent simplicity, the solution of a system of linear equa- 
tions by direct algebraic elimination is of such fundamental importance that 
it deserves analysis here. When we first considered the system (5.1) or equiv- 
alently (5.2 ) , we regarded the column vector Y as fixed and sought to deter- 
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mine all column vectors A r for which AX = Y. In these terms the system 
represented m linear equations in n unknowns. Sometimes it is useful to 
regard the vector Y as variable, in which case we have a system of m linear 
homogeneous equations inm + n variables: 

— Y + AX = Z. 

It is convenient to eliminate the minus sign by letting V = — Y, to obtain 
in block form 

(I\A) (£) = Z, 

or, in extended form, 

Vi + a n Xi + • * • + a ln x n = 0, 

Vi + (hi * + din x„ = 0, 

(5.6) 

Vm "b d m \Xi "j“ • ■ * "b dmnX n ~ 0. 

When any system of m linear equations in m + n unknowns is written in 
the form (5.6), it is a trivial matter to solve for the v's in terms of the x’s. 
However, the usual objective is to solve for as many as possible of the x 3 
in terms of the i/s and the remaining x’s. The idea which underlies most 
methods of solution is akin to horsetrading: to solve (5.6) we exchange this 
system for another system which has exactly the same solution but which 
is preferable in some sense, until we arrive at a system of the form 

Xv + frll£(r+l)' + • • • + buX n ' + Cn V\ + * • * + C\ m V m = 0, 

(5.7) Xr' + ^rl^r-fl)' "b * * * “b b rg X n > C r \ V\ + * ' • “b CrmVm — 0, 

0a’(r+l)' + • * • + 0x n / + CnUx + * • • + CtmVm = 0, 

0£( r +i)' + * * * 0x n ' + CmiVi + * * * + C mm V m = 0, 

where {1', . . . , n'} is a permutation of {1 , . . . , n} , r < m, s = n — r, and 
t = r + 1. Of course r is simply the rank of the matrix A of coefficients of 
(5.6), and any solution of (5.7) is obtained by assigning to the v’s any set of 
values which satisfy the last m — r equations and assigning arbitrary values 
to X(r+n', . . . , x„'. In case the values of the */s are specified, (5.7) has a non- 
vacuous solution if and only if the last m — r equations reduce to 0 = 0. 

Now let us formalize these ideas : Two systems of linear equations are said 
to be equivalent if and only if any solution of either system is a solution of the 
other. In manipulating equations we wish to be certain that any operations 
we perform will produce a system which is equivalent to the original system. 
What operations are permissible under this requirement? First, it is appar- 
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ent that the solution is not affected by the order in which the equations are 
written. Hence any permutation of the arrangement of the equations will 
produce an equivalent system. Second, any equation may be replaced by a 
nonzero scalar multiple of itself. Since such a scalar has a reciprocal, the 
process can be reversed, and so the two systems are equivalent. Third, an 
equation can be replaced by the sum of itself and any other equation in the 
system. In summary, we consider three types of elementary operations: 

P: permutation of any two equations, 

M: multiplication of any equation by a nonzero scalar, 

A: addition of one equation to another. 

It will be noted that the three elementary operations correspond to row 
operations which are useful in evaluating determinants by replacing a given 
determinant by an equal determinant which is simpler. In this connection, 
“simpler” usually means that the new' determinant contains more zeros, or at 
least a more useful arrangement of zeros. The effect of each of these oper-' 
ations on a determinant is described by Theorem 5.5 (h), (a), and (g), inter- 
preted for rows instead of columns. The effect of each of these row operations 
on a matrix will be considered in some detail in the next chapter. 

There is another general approach to the solution of (5.1) which leads us to 
a different type of operation on matrices and w r hich has recently received 
recognition as being the essential arithmetic process of the simplex method 
for solving problems in linear programming. The forms of (5.6) and (5.7) 
show T clearly that the process of solving a system of linear equations is simply 
to reverse the roles of the i/s and some of the x’s. This is so whether the r's 
are regarded as known scalars or as variables, and it can be done systemat- 
ically, as in elementary algebra, by selecting an equation, solving that equa- 
tion for one of the x, in terms of the r’s and the other x’s, and substituting 
the resulting expression for x, in all the other equations of the system. 

Specifically, suppose that a l} j* 0 in (5.6). Then 

(5.8) Xj = -ao 1 [v x + a, iXi + • • • + a, tJ _iX ; _i + a tJ +,x,+i + • • • + a t „x n ]. 

A new system having the same solutions as (5.6) is obtained as follow's: 
for k t* i, equation k of the new system is the one obtained by substituting 
(5.8) for Xj in equation k of (5.6); equation i of the new system is (5.8). The 
new system can be written in the form 

Vl + £>11 X\ + * • * + b x) Vi 4 - • • • + bi n x n — 0 

x } b % !«!+•-•+ btj Vi + ■ ■ • + bin Xn = 0 


Vn + bmiXi + * • * + b mj V x 4 * + b mn X n =* 0 . 
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Note that the roles of v x and x, have been interchanged; in matrix form the 
original system 

aw Q - z 

has been replaced by the new system 

m (£) = z, 

where V * coincides with V except that x 3 has replaced v x in the ith com- 
ponent, and X * coincides with X except that v t has replaced x } in the jth 
component. The entries of the matrix B are given by 

drt — arj l a r ja ia = 1 det ( Crj ) if r i and $ 9* j, 

\®h fli;/ 

b„ = — a^ y a r} \f r 7 * i and s = j , 

if r = i and $ ^ j , 
afj 1 if r — i and s = j. 

An operation of this type is called a pivot operation on the nonzero element 
a X j. Since all steps are reversible, the new system has the same solutions as 
the original system. Whenever a succession of m pivot operations replaces 
all of the v’s by x’s, the system is solved. The calculations needed to perform 
one pivot operation can be arranged so as to require only one division and 
mn — 1 multiplications. In particular, if A is n X n and nonsingular, n pivot 
operations will suffice to find A -1 with at most n divisions and n 3 — n multi- 
plications. This shows that pivot operations form a relatively efficient method 
of computation, especially for large n. 

Example 

To illustrate the use of pivot operations in solving linear systems, we con- 
sider the example of Exercise 1, § 5.4: 

2xi - Xi + 3x s = 3, 

Xi ■ = — 2 , 

2xi + x 2 + x 8 * 1. 

A pivot operation on a XJ interchanges v x with Xy and replaces the matrix A 
with the matrix J3. Hence we adopt a computational format which keeps 
track of all the essential information. A is written in table form with each 
column labeled by the corresponding x and each row by the corresponding v. 
Each pivot operation will produce a new table of coefficients and interchange 
the row and column labels corresponding to the pivot element : 
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table : 



X\ Xi Xi 

Vl 

2-1 3 

y 2 ! 

0 1 * 0 

l>3 

2 1 1 

2 , \h = ■ 

— 1. A pivot on th 

in terms of x h x 3 , and v 2} 


Xi Vi Xz 

t’i 

2 1 3 

x 2 

0 1 0 

f’3 

2 -1 1 * 


A second pivot solves for x 3 in terms of x u v 2 . and vz and produces the following 
table: 

*1 


v 2 


vz 


l'l 

-4* 

4 

-3 

Xi 

0 

1 

0 

Xz 

2 

-i 

1 

i terms of v u 

v 2} and vz\ 


Vi 

Vi 

^3 

Xi 

_ i 

4 

-1 

* 

x 2 

0 

1 

0 

Xz 

1 

2 

1 

-h 


The solution is read from this table by writing 

X\ = - U'2 + 

x 2 = Ot>i + lv 2 + 0e 3 , 
x 3 = \v i + \v 2 - \vi. 

Using the given values of the v X} the solution is 

xi = 2 , 

Xi = - 2 , 

x z = — 1. 

Further examples are suggested in Exercise 1 below. 


Pivot operations can also be used to define an interesting relation between 
matrices, called combinatorial equivalence. However, since combinatorial 
equivalence is not used to develop the fundamental ideas of the next three 
chapters, its discussion is deferred to Chapter 9. Other equivalence relations 
are studied in Chapters 6 and 8. 
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One further comment should be made. The particular form for a pivot 
operation, derived above, is a consequence of our having started with a system 
of equations in the form (5.6), or equivalently, in the form 

(, w © - z - 

Had we begun with equations in the form (5.1), 

(-/n) Q = z, 

which is obtained simply by making the substitution V = — Y, then a second 
form of pivot operation would be obtained. After a pivot on a tJ j* 0, the 
system 

(-/w Q - Z 

is replaced by the system 

(—I\B*) (£*) = Z, 

where (as before) Y* coincides with Y except that Xj has replaced y x in the 
ith component, and X* coincides with X except that y t has replaced x , in 
the jth component. B* can be obtained from the matrix B obtained in the 
pivot operation as originally defined by multiplying each element of row i 
and each element of column j by — 1 ; this implies that btj = b lJ} but other- 
wise the minus signs that appeared in column j of B have been shifted to 
appear along row i of B*. 

Both forms of pivot operations appear in the literature; the only difference 
is in a few signs, and being aware of the existence of both forms you should 
have no difficulty in following both. 

Exercises 

1. Solve the following systems by the use of pivot operations: 

(i) Exercise 3, § 5.1, 

(ii) Exercise 4, § 5.1, 

(iii) Exercise 5, § 5.1. 

2. Let A be an n X n matrix, let a %} ^ 0, and let B be the matrix ob- 
tained from A by pivoting on a t > Prove that 

detB = ay^Aijl 


3. Given A - 


1 -2 -r 

2 3 1 

<0 5 —2, 
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(i) Compute detA. 

(ii) Pivot on an = 1 to obtain a matrix B, and verify the result of 
Exercise 2. 

(iii) Pivot B on b a = 7 to obtain a matrix C, and again verify the 
result of Exercise 2. 

(iv) Pivot C on c 3 3 = — V' to obtain a matrix D. 

(v) Show that D = A~ l . 

(vi) Show that det A equals the product of the pivots used in trans- 
forming A to A~ l . 

4 . Let A be an n X n matrix, let a,j ^ 0 , and let B be the (n — 1 ) X 
(n — 1) matrix obtained by pivoting A on a„ and then deleting the ith row 
and jih column. Prove that 

det A = (— l)* +, a i; det B. 

5 . Let A be an n X n matrix partitioned as follows: 



where A\ is a (n — 1 ) X (n — 1) matrix, 0 is an (n - 1) column vector, 7 is 
a (n — 1) row vector, and d is a nonzero scalar. Then in terms of matrix 
multiplication £7 is an (n — 1 ) X (n — 1 ) matrix. 

(i) Show that dot A = d-~ n det ((Mi — f)y). 

(ii) Show also that dAi - £7 can be computed with at most 2(n — l) s 
multiplications, and therefore by successive applications of this method det A 
can be computed by no more than 2 n 3 /3 multiplications. 

(iii) For what values of n does this method of evaluating det A require 
more multiplications than those of Theorem 5.6 and Theorem 5 . 9 ? 

6. As discussed at the close of this section, begin with a system of the 
form 

<-/w Q z, 

where a,j ^ 0. Pivot on a x , and verify that the resulting system is 

(-/IB*) (£) = Z, 


as described in the text. 



CHAPTER 6 


Equivalence Relations 
on Matrices 


§ 6 . 1 . Introduction 

In §5.5 we skirmished tentatively with the central problem of matrix 
theory ; we now need to describe the problem in more definite terms, because a 
major portion of the next three chapters will bear directly on its solution. 

First we should recognize that we have already used matrices to describe 
two different mathematical entities: linear transformations and systems of 
linear equations. In Chapters 8 and 10 we shall see that matrices can repre- 
sent still other mathematical structures, each of which has its own distinctive 
problems and methods, which in turn can be used to define corresponding 
matrix concepts. 

For the case of linear transformations the matrix representation was made 
in terms of preselected bases which were fixed throughout the discussion. 
Presumably then, a different choice of bases would have resulted in a differ- 
ent matrix representation of the same transformation. But since a linear 
transformation is a vector space homomorphism, it is intrinsically independent 
of the coordinate systems (bases) of the spaces involved, and any matrix which 
represents a fixed linear transformation reflects the properties of that trans- 
formation. Therefore, as we vary the bases we can expect to obtain different 
representative matrices which share certain common properties. We wonder 
how such matrices are related to each other, and particularly how we can 
select a “simplest” matrix to represent that linear transformation. 

Now let us return to the problem of solving a system of linear equations. 
As described in § 5.5, the general method is to exchange the given system 
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for an equivalent system — that is, one with precisely the same solutions as 
the original. By so doing we exchange the given coefficient matrix for an- 
other matrix. Since a matrix reflects the properties of the system it repre- 
sents, different matrices which represent equivalent systems must have 
certain properties in common. Again we wonder how such matrices are 
related to each other and how we can find a “simplest” representative matrix. 

In order to deal effectively with such problems we need to use the concept 
of equivalence relation. Relations arc discussed in § 1.3 and § A.2, and equiv- 
alence relations are treated in § A. 9, but for convenience we summarize here 
some essential facts concerning equivalence relations. Let M denote any 
nonvoid set; as a specific example we might think of M as the set of all 
m X n matrices whose entries are elements of a field ;T. Let the symbol ~ 
denote a relation between the elements of M. Then the relation ~ is called 
an equivalence relation on M if and only if three properties are satisfied: 

1 . ^ ' is reflexive; A ~ A for every A G M. 

2. ~ is symmetric; if A ~ B, then B ~ A. 

3. ~ is transitive; if A ~ B and B ^ (\ then A ~ C. 

Any equivalence relation, on M separates M into disjoint subsets, called 
equivalence classes. Each equivalence class [. E ] has the property that 

if A G [E], then B G [ E ] if and only if B ~ A ; 

that is, all equivalent elements belong to the same equivalence class, and any 
two elements of the same class arc equivalent. 

As we shall see, many different equivalence relations arise naturally in the 
study of matrices. For each equivalence relation we shall want to describe 
the matrices which appear in each of the corresponding equivalence classes 
and, if possible, to find a simple standard form such that each equivalence 
class of M contains one and only one matrix which is in that form. Such a 
form is called canonical. Sometimes we are content to settle for loss; namely, 
it might suffice fo obtain a standard form such that each equivalence class 
contains more than one matrix in standard form, but if two matrices have 
the same standard form they must be in the same equivalence class. 

Exercises 

1. Determine which of the three properties of an cquivalenceKfrelation are 
satisfied by each of the following relations. 

(i) Similarity of plane triangles. 

(ii) Parallelism of lines on a plane. 

(iii) Strong inequality of real numbers. 
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(iv) Divisibility of integers. 

(v) Perpendicularity of lines on a plane. 

2. Describe a simple canonical form for each of the equivalence relations 
defined in Exercise 1 (i) and (ii). 

«3. For fixed m and n, consider all real m X n matrices. We define A ~ B 
to mean that A has the same rank as B. 

(i) Verify that ~ is an equivalence relation. 

(ii) Describe the corresponding equivalence classes. 

(iii) Describe a simple canonical form for this notion of equivalence. 

4. A student waiter drops a plate, thereby separating it into a finite num- 
ber of disjoint pieces. Describe how this act defines an equivalence relation 
on the molecules of the plate such that the pieces of the decomposition form 
the equivalence classes. 


§ 6 . 2 . Elementary Matrices 

We now consider the matrix interpretation of the elementary operations 
which were introduced in § 5.5. There we were concerned with equivalent 
systems of linear equations, where two systems were considered equivalent 
if and only if they have the same solutions. Clearly, this defines an equiva- 
lence relation on the collection of all systems of linear equations with coeffi- 
cients in a field S7. Each such system determines a unique rectangular matrix A 
according to the representation of § 5.1. The elementary operations were so 
chosen that the application of each operation to a system produced a new 
system which was equivalent to the original. We shall see that a correspond- 
ing equivalence relation is induced on the matrices which represent the sys- 
tems. For this purpose we focus our attention on three elementary row opera- 
tions for matrices: 

P: permutation of two rows, 

M : multiplication of a row by a nonzero scalar, 

A: addition of one row to another. 

We first examine the effect of these operations on the identity matrix. 

Definition 6.1. An elementary matrix is any matrix which can be ob- 
tained by performing a single elementary row operation on the identity 
matrix. 

There are three types of elementary matrices, one for each type of elemen- 
tary row operation. To describe these we recall the notation of §4.2; E r% 
denotes the square matrix with e tJ = 0 if i y* r or j 9 * s and e„ = 1. 
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P: Let P tj denote the matrix obtained from I by permuting the tth and 
jth rows. Then 

P v = 7 “ -Bn + E Jt — E } j + E tJ . 

M: Let A/,(c) denote the matrix obtained from 7 by multiplying the tth 
row by c ^ 0. M x (c) is obtained by adding c — 1 to the element in the (*, i) 
position, so 

M t (c) = / + (c - 1)2?„. 

A: Let denote the matrix obtained from 7 by adding row i to row 
where i j. Clearly, 

4.,-7 + JE,,. 

The usefulness of the three elementary matrices P %J1 A/,(c), and A XJ stems 
from the fact that an elementary row operation on an arbitrary rectangular 
matrix A may be performed by premultiplying A (i.e., multiplying A on the 
left) by the corresponding elementary matrix. But there is more to be said 
in order to make our meaning precise. Since there is an identity matrix of 
each dimension, there are three elementary matrices of each dimension. If A 
is m X n y then any premultiplying matrix B must have m columns for BA to 
be defined. 

Theorem 6.1. Any elementary row operation can be performed on an 
m X n matrix A by premultiplying A by the corresponding m X m 
elementary matrix. 

proof: Exercise. 

Theorem 6.2. Every m X m elementary matrix is nonsingular. 
proof: Exercise. Show that each has m linearly independent rows. 

Theorem 6.3. The inverse of an elementary matrix of type P or type 
M is an elementary matrix of the same type. The inverse of an elemen- 
tary matrix of type A is the product of elementary matrices of type M 
and type A. 

p r o o f : By expressing each of the elementary matrices in terms of the 
Era and using the results of Exercise 1, below, it is easy to verify that 

Pi 1 - P>» 

Mr\c) = 

Ay 1 = 7 — E Jt - 1)A W AT.(— 1). 

It should be observed that any type P elementary matrix is a product of 
elementary matrices of type M and type A (see Exercise 5, below). If we 
were interested in the most concise axiomatic treatment of row operations, 
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we would consider only the latter two types. However, once we have ob- 
served this relationship there is little to be gained by insisting upon using 
it to replace the natural operation of interchanging rows. 

Exercises 

1. Prove that E tk Ekj = fohE tJt and therefore that each E tj is cither idem- 
potent or nilpotent of index 2, according to whether i = j or i 9 * j. 

2. Prove Theorem 6.1. 

3. Prove Theorem 6.2. 

4. Carry out the calculations which establish the statements made in the 
proof of Theorem 6.3. 

5. Using Exercise 1, or otherwise, show that 

1 \ } = 

6. Write a sequence of elementary row operations whose only effect on A 
is to add a constant multiple of the ith row of A to the jth row of A. 

7. Calculate the determinant of each type of elementary matrix. 


§6.3. Row Equivalence 

The tl\ree types of elementary row operations were selected by a considera- 
tion of algebraic processes which lead from one system of linear equations to 
an equivalent system — that is, one with the same solutions as the original. 
While it is clear that elementary row operations transform any system into 
an equivalent system, the converse is not so obvious — that any two equivalent 
systems can be derived from each other by a finite sequence of these elemen- 
tary row operations. Our investigation of this question will provide a non- 
trivial example of the general problem of equivalence and canonical forms, as 
discussed in § 6.1. As is often the case, side results of the investigation will 
prove to be more important than the answer to the original question. 

Definition 6.2. An m X n matrix B is said to be row equivalent to an 
m X n matrix A if and only if B can be obtained by performing a finite 
number of elementary row operations on A. 

The relation of row equivalence of m X n matrices is easily seen to be an 
equivalence relation. It is reflexive and transitive by its nature, and sym- 
metric because if B can be obtained from A by elementary row operations, 
then the reversed sequence of inverse operations applied to B will yield A. 
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The collection of m X n matrices is thus partitioned into disjoint classes of 
row equivalent matrices. 

Theorem 6.4. Row equivalent matrices have the same rank, 
p r o o f : If B is row equivalent to A, then 

R = Eic . . . 

where the Ei are elementary matrices and hence nonsingular. By The- 
orem 4.8, A and B have the same rank. 

Now let A = ( a x) ) be an m X n matrix and suppose that the qth column is 
the first column in which a nonzero element, say a pu% appears. If we multiply 
row p by a~ v l and then interchange row p and row 1, we obtain a matrix with 
1 in row 1 and column q. Then by adding suitable multiples of row 1 to the 
other rows we obtain a matrix of the following form which is row equivalent 
to A, 



where each * denotes some scalar and where R is an (m — 1) X (n — q) 
matrix. We continue the process by operating with the last m — 1 rows of 
B. In R we find the first column in which nonzero element b„ appears, and 
multiply row r by b~ r }. We then interchange row r of B with row 2 of B, 
and as before produce zeros in column s of every row below row 2. Eventually 
we obtain a matrix which is row equivalent to A and has the following 
properties: 

1. The first k rows are nonzero; the other rows are zero. 

2. The first nonzero element in each nonzero row is 1, and it appears in a 
column to the right of the first nonzero element of any preceding row. 

An example of a matrix in this form, for k = 4, m = 5, n = 8, is 

( ° 1 * * * * * *\ 

001 *****! 

00001 *** 1 . 

00000001 / 

0000000 0 / 

A matrix which satisfies properties 1 and 2 is said to be in echelon form. 
Suppose B is a matrix in echelon form, and consider any nonzero row. The 
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first nonzero element of that row is b i} = 1. In column j above there are 
elements which may or may not be zero. Below 6 t > the jth column contains 
only zeros. Clearly, by a succession of row operations, the elements above 
bij in column j can be replaced by zeros, and the resulting matrix will still 
be in echelon form with the additional property, 

3. The first nonzero element in each nonzero row is the only nonzero 
element in its column. 


Any matrix which is in echelon form and also satisfies property 3 is said to 
be in reduced echelon form. 

A reduced echelon form for the matrix of the preceding example is 


0 

1 

0 

* 

0 

* 

* 

0 

0 

0 

1 

* 

0 

■* 

* 

0 

0 

0 

0 

0 

1 

* 

* 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 - 

0 

0 

0 

0 


Theorem 6.5. Any m X n matrix of rank k is row equivalent to a 
matrix in echelon form (also, reduced echelon form) with k nonzero rows. 

proof: Our previous discussion established the row equivalences which 
the theorem asserts, so we need only prove the statement concerning 
rank. The rank of any matrix cannot exceed the number of nonzero 
rows in that matrix, and the nonzero rows of an echelon matrix are 
linearly independent. Hence its rank is the number of its nonzero rows, 
and by Theorem 6.4 any matrix row equivalent to it has the same rank. 

We now apply the concept of row equivalence to give an independent 
proof of Theorem 4.4 — a matrix and its transpose have the same rank. Let 
E be a matrix which is row equivalent to A and in reduced echelon form. If 
p(A ) = k } consider the k column vectors whose only nonzero element is a 1 
which is the first nonzero element of the row in which it appears. These k 
column vectors are linearly independent, and every other column vector is a 
linear combination of these. Since the columns of E are the rows of E' t 
p(E') *= k = p(E) = p(A). Also,!? = EpE p ^ • ■ • E X A, so E' = A'E X E 2 • • • E' v . 
Since the transpose of an elementary matrix is an elementary matrix, and 
therefore nonsingular, p(A') = p{E f ). Hence p(A') = p(A). 

Theorem 6.6* The rank of any matrix is the dimension of its largest 
nonsingular submatrix. 

proof: Exercise. (By a submatrix of A we mean the array which is 
obtained by deleting a set of rows and a set of columns from A.) 
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Theorem 6.7. An n X n matrix is nonsingular if and only if it is row 
equivalent to the identity matrix. 

proof : If A is row equiv alent to /, then A must have rank n, and 
thus be nonsingular. Conversely, suppose that A is nonsingular. Then 
it is row equivalent to a matrix E in reduced echelon form and of rank n. 
Hence E = I. 


Theorem 6.8. A square matrix is nonsingular if and only if it is the 
product of elementary matrices. 

proof : If A is nonsingular, then by Theorem 6.7 
E k • • • EiE\A = / 

for suitable elementary matrices. Hence A = /?f 'AY 1 • • • E k l . Since the 
inverse of each elementary matrix is the product of elementary matrices, 
A is a product of elementary matrices. The converse is trivial, since the 
product of nonsingular mal rices is nonsingular. 

The usefulness of this theorem actually lies in its proof, because we have 

E k • • • EtEyl = A~\ 

which gives us a second way of calculating the inverse of a nonsingular matrix. 
We determine the row operations needed to reduce A to /. Those same row 
operations when applied to / yield A~ l . A similar method of calculating 
A~ l is given in the next section. 


Example 


To calculate the inverse of 


A = 



we write the block form (I\A) and perform on this 3X6 matrix a sequence 
of row operations which reduces A to /, yielding (B\I). Then B = A~ l . 


/I 

0 

0 

1 

2 

3 \ 

/ 1 

0 

0 

1° 

1 

o 

2 

3 

0) 

>-(-2 

l 

0 

\o 

0 

1 

0 

1 

2/ 

Ri — 2Ri \ q 

0 

1 


1 2 3\ 

0 -1 - 6 ] 
0 1 2 / 


' 1 

0 

0 

1 

2 

3 \ 

n 

0 

0 

1 

2 

3 \ 

-2 

1 

0 

0 

-1 

— 6 ]- 

M2 

-1 

0 

0 

1 

6 1 

R.+R, ^_ 2 

1 

1 

0 

0 

-4/ 


-i 

-i 

0 

0 

1/ 


f 1 

0 

0 

1 

2 

3 \ 

1 

f i 

-i 

-f 

l 

0 

-1 

4 

4 

0 

1 

0] 

1 >- 

-l 

4 

1 

0 

1 

£ 

0 

1 

£ 

-i 

_ i 

4 

0 

0 

l) 

' Ri — 2Rj — 3Ra ) 

V 4 

-i 

-4 

0 

0 
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Thus 



a result which should be checked by showing that A~ l A = I. 

Theorem 6.9. B is row equivalent to A if and only if B = PA for some 
nonsingular matrix P . 

proof : Exercise. 

Finally we return to the question posed at the beginning of this section. 
Given any two systems of linear equations having the same solutions, are their 
corresponding matrices row equivalent? To answer this we first determine 
what meaning row equivalence of matrices has for the corresponding linear 
transformations. Let A, B be m X n matrices; choose a basis {ai, . . . , a m } 
for *U m and {0i, . . . , 0 n } for *W„. Let T and S be the linear transformations 
from V m to W,, determined by A and B relative to this choice of bases; that 
is, a t T and a t S are respectively represented in the 0-basis by the ith rows of 
A and B, i = 1, . . . , rn. 

Theorem 6.10. Relative to a pair of bases, let matrices A and B rep- 
resent linear transformations T and S. Then A and B are row equivalent 
if and only if (R T = (Rs- 

proof: (R T is the subspace of W n which is spanned by the rows of A. 
If B is row equivalent to A, the rows of B are linear combinations of the 
rows of A. Hence (Rs Q (Rt, and equality must hold since row equiva- 
lence is a symmetric relation. Conversely, if (Rt = (Rs, the row vectors 
of B are linear combinations of the row vectors of A, and each linear 
combination of rows can be performed by elementary row operations. 

Theorem 6.11. Two matrices A and B in reduced echelon form are 
row equivalent if and only if A = B. 

proof: Equal matrices are row equivalent. Conversely, let A and B 
be row equivalent matrices, both in reduced echelon form; as above, 
A and B correspond to linear transformations T and S. A and B have 
the same rank r, so the first r rows of A and B, and only those rows, are 
nonzero. For each k = 1 , 2, . . . , r the first nonzero element of row k 
of A and of B must be 1 ; let t k denote the column in which that element 
occurs in A, and s k the cc’umn in which it occurs in B. From the form 
of a reduced echelon mat we observe that 
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if 1 ^ fc, then t, ^ t k , 

Ukth ~ 1 


a t u = 0 for nil i ^ k, 
with similar properties holding for B. 

We assert that s k = t k for each k. Suppose that for some index p < r, 
s p < t v but that s k = U for k < p . Since CRS = (H? by Theorem 6.10, 
there exist scalars c h . . . , c rt not all zero, such that 


<* P S = Z c,a t T = £ c, ^X a t ,8^ = X ^X 


= X 6 r ,d;. 


Hence for each > = 1, 2, . . . , n, 


? = i 

Thus in particular, 

r 

(6.1) bp,, = X r,a„, = c P 

1 = 1 

and 

r 

frjMp = 1 = X 
1 - 1 

But A is in reduced echelon form and s p < t p , so a„ p = 0 for i > p . 
Hence for some q < p, 

^ 0. 

Then t q = s, < s p since B is in reduced echelon form. Thus a qtg = 1, 
a,£, = 0 if i 9* q, and 

r 

l)pt„ = X = c 9 5^ 0. 

i-l 


Since < s P , this contradicts the statement that row p of B has its 
first nonzero element in column s p . Hence the assumption that s p < i p 
is false. If t p < s P} we can repeat the entire argument, reversing the roles 
of A and B to obtain a similar contradiction. Hence s k = t k for k = 
1 , 2, . . . , r; it follows from the nature of the reduced echelon form that 
columns t h t 2 , . . . , ^ of A and B are identical. Furthermore, from (6*./), 

Cp 1. 

Let k p, A' < r; we have 


bptk 



*= a p t k — 0, 
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since column t k of B coincides with that of A. Thus, c p 
if k p. Therefore, 

r 

bp] ^ C%Clij = O/pj 
1-1 

for all p < r and all j, so A = B. 

From this theorem we draw two conclusions. 

There is one and only one reduced echelon matrix which is row equivalent 
to a given matrix A ; that is, the reduced echelon form is canonical with respect 
to row equivalence. 

Two homogeneous systems of m linear equations in n unknowns are equiva- 
lent (have the same solutions) if and only if their coefficient matrices are row 
equivalent. 

We note also that if row equivalence had been defined to include the trivial 
row operations of adding or deleting a zero row, then the latter statement 
above would be valid for any two homogeneous systems of linear equations 
in n unknowns. 


[ C H . 6 ] 

= 1 and c k = 0 


Exercises 


1. Use the method of this section to calculate the inverse of each of the 
following matrices: 


(i) 


(ii) 



2. Determine which of the following matrices are row equivalent: 
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3. Prove Theorem 6.6. 

4. Prove Theorem 6.9. 

5. Show in detail how the two statements at the end of this section follow 
from previous theorems. 

6. Prove that if A is m X n and P XJ is n X n, then coincides with A 
except that columns i and j are interchanged. 

7. (i) Let A be an m X (m + n) matrix of rank m, and let E be the 
reduced echelon form of A . Show that a permutation of the columns of E 
transforms E into block form (/|fi) where B is m X ft. 

(ii) Using Exercise 6, or otherwise, show that there exist nonsingular 
matrices P and Q such that 

PAQ « (I\B). 


§ 6 . 4 . Equivalence 

If we consider matrices as rectangular arrays, without regard to any sys- 
tems which they represent, it is as natural to perform elementary operations 
on columns as on rows. For this purpose we observe that a column operation 
on A is the same as a row operation on A'. Hence .4 may be transformed 
into B by a succession of elementary column operations if and only if A ' can 
be transformed into B' by the same succession of row operations. That is, if 

B' = PA', 

then 

B = ( PA')' = AP' = AQ , 

where Q is nonsingular, since Q is the transpose of the nonsingular matrix P. 
Hence column operations can be performed on A by postmultiplying A by a 
suitable nonsingular matrix. 

We next propose to study the effect of changing A by both column opera- 
tions and row operations. If B is the resulting matrix, then 

B = PAQ , 

where P is the nonsingular matrix which performs the row operations and Q 
is the nonsingular matrix which performs the column operations. 

Definition 6.3. B is said to be equivalent to A if and only if B can be 
obtained from A by a finite number of elementary row and column 
operations. 

Theorem 6.12. B is equivalent to A if and only if B = PAQ for suit- 
able nonsingular matrices P and Q. 
proof : See the remarks preceding Definition 6.3. 
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Theorem 6.13. Equivalence of matrices is an equivalence relation. 
proof: Exercise. 

Theorem 6.14. An m X n matrix of rank k is equivalent to the m X n 
matrix B in which b n = 6 22 = • ■ • = b kk = 1, and b tJ = 0 otherwise. 
proof: Let A he of rank k. If k = 0, then A = Z, and there is nothing 
to prove. Otherwise, by row operations we obtain the reduced echelon 
form of A, with k nonzero rows, the first nonzero element of each of which 
is 1, and it is the only nonzero element in its column. By permuting the 
columns we place these Ts in the first k diagonal positions, obtaining 
the block form 



Column operations are then used to produce zeros in the last n — k 
columns of the first k rows. 


Theorem 6.15. Two m X n matrices are equivalent if and only if they 
have the same rank. 

proof: If A and B are equivalent, p(A) = p(B) since B = PAQ. 
Conversely, if A and B have rank k, each is equivalent to the matrix 
described in Theorem 6.14. 


From. Theorem 6.15 we derive two immediate corollaries. The first is that 
the form described in Theorem 6.14 is canonical with respect to equivalence. 
The second we state formally. 


Theorem 6.16. A square matrix is nonsingular if and only if it is equiv- 
alent to the identity matrix. 
proof : Apply Theorem 6.15. 


Therefore, if A is nonsingular, there exist nonsingular matrices P and Q 
such that 

/ = PAQ , 

P-'Q-' = A, 

A- 1 = QP. 

Recall that P is obtained by performing on / the same row operations which 
were performed on A, and that Q is obtained by performing on I the same 
column operations which were performed on A, the combination of row and 
column operations transforming A into I. This gives us another method of 
computing A~\ and the following scheme simplifies the calculation of P and 
Q. Write 1 to the left of A and below A : 


I 


A 

I 
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Perform row and column operations on A as needed to transform A into /. 
As each row operation is performed on A, perform the same row operation 
on the matrix at the upper left of the array. Similarly, as each column opera- 
tion is performed on A , perform the same column operation on the lower right- 
hand matrix. Then the final array is 


and A” 1 = QP. 


Q 


Example 

Let 



1 

0 

0 

2 

-1 

0 


0 

1 

0 

1 

2 

1 


0 

1 

0 

1 

2 

1 


1 

0 

0 

2 

-1 

0 


0 

0 

1 

-1 

0 

3 

Inter- 

change 

Hi and R 2 

0 

0 

1 

-1 

0 

3 





1 

0 

> 

0 




1 

0 

0 





1 

1 

0 





0 

1 

0 





0 

0 

1 





0 

0 

1 


0 

1 

0 

1 

0 

0 


0 

1 

0 

1 

0 

0 
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0 

0 

2 

-5 

-2 


1 

— 2 

0 

0 

-fl 

-2 

C 2 - 2Ci; 

0 

0 

1 

-1 

2 

4 

R 2 - 2Ri; 

0 

1 

1 

0 

2 

4 

C* - Ci 




1 

-2 

* 

-1 

R> + Ri 




1 

— 2 

) 

-1 





0 

1 

0 





0 

1 

0 





0 

0 

1 





0 

0 

1 


0 

1 

o 

1 

0 

0 


0 

1 

0 

1 

0 

0 


0 

i 

i 

0 

1 

2 


0 

i 

2 

0 

1 

0 

JR 8 ; inter- 
change 

R 2 and R* 

1 

-2 

0 

0 

-5 

-2 

O 3 — 2 C 2 

1 

-2 

0 

0 

-5 

8 




1 

-2 

> 

-i 




1 

-2 

1 

3 





0 

1 

0 





0 

1 

-2 





0 

0 

1 





0 

0 

1 


0 

i 

0 

1 

0 

0 


0 

1 

0 

1 

0 

0 


0 

* 

1 

0 

1 

0 


0 

i 

i 

0 

1 

0 

Ri “I - 5R* 

1 

* 

* 

0 

0 

8 

|R. 

i 


A 

0 

0 

1 




1 

-2 

f 

3 




1 

— 2 

3* 





0 

1 

-2 





0 

I 

-2 





0 

0 

1 





0 

0 

1 
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We have 


P -- 

and therefore 



16 

8 

1 



-2 

1 

0 


A~ l = QP = 





This method of calculating A~ l is tedious to write, but all calculations are 
easy ones. 


Exercises 

1. Use the method of § 0.4 to calculate the inverse of each matrix of 
Exercise 1, § 6.3. 

2. Prove Theorem 6.13. 

3. Show that t lie form described by Theorem 6.14 is canonical for matrix 
equivalence. 

4. Which of the matrices of Exercise 2, § 6.3, are equivalent? 

5. Show that if A is a symmetric n X n matrix of complex numbers, then 
there exists a nonsingular matrix P such that PAP’ is in canonical form for 
matrix equivalence. 

6. If A and B are equivalent, determine whether or not each of the 
following pairs are equivalent : 

(i) A' and B\ 

(ii) A 2 and B 2 , 

(iii) AB and BA. 


§6.5. Similarity 

Thus far we have introduced two different equivalence relations for matrices 
which were suggested by processes for solving a system of linear equations. 
In this section we resume the study of linear transformations and their 
representative matrices. In so doing we discover an important interpreta- 
tion of equivalence of matrices, a special case of which leads us to a third 
equivalence relation, called similarity. Other equivalence relations will be 
considered in Chapter 8, where matrices are used to represent still another 
mathematical structure. 

In § 4.1 we derived the matrix representation of a linear transformation T 
under the agreement that our discussion pertained to a fixed basis (a L , . . . , a m } 
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of the domain U« of T and a fixed basis {ft, . . . , 0„) of a space *W„ which 
contained the range of T. With this understanding, T is represented uniquely 
by a matrix, but the matrix representation of T depends upon the choice of 
bases. Now we are ready to determine the relationship between two matrices, 
each of which represents the same T with respect to independent choices 
of bases for V m and *W n . 

Let T be a linear transformation from D m to W n . With respect to bases 
{«i, . . . , a m } and {(3 l} . . . , /?„}, T is represented by a uniquely determined 
matrix A. With respect to bases { 71 , . . . , y m ) and {5,, . . . , 6 *}, T is repre- 
sented by a matrix Q. Thus 

= L a. A, 

7 ;T = L c jk h k . 
k - 1 

Let R be the linear transformation which maps 7 , onto a, in D m . Since R 
maps a basis onto a basis, it is nonsinguiar, and relative to the 7 -basis it is 
represented by a nonsingular matrix / J , where 

m 

a, = 7 ,R = £ p.p,,. 

Similarly, let S be the linear transformation which maps 5j onto in \V ti . S is 
represented relative to the 5-basis by a nonsinguiar matrix Q t where 

n 

0j = 5jS = q jk dk. 

1 

The situation is represented graphically by the following scheme, where sub- 
scripts on the matrices indicate the bases concerned. 

V m = [a] 'Wrt = ffl 

Aa, fi 

H | Py,y Qt ,t | S 

= [ 7 ] v, = [6] 

Figure 6.1 

We compute the T-image of on in two ways: 

a * T ~ S a *i i Q)k&k \ = 5Z ( £3 tiijQjk} 

;-l 1 \*«1 / / 

( m \ m / n \ n / rn \ 

£ P>;7; ) T = 23 Pi; I Cjk&k) — £ ( 23 Pu c ;*J &k, 

;«1 / i-1 \*-l / fc-1 \;-l / 
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for i ** 1 , 2 , . . . , m. Since a/T is a unique linear combination of the 6k, 

n m 

X dijQjk = X PtjCjk. 

* m 1 

The left-hand side is the (i, fc) element of AQ , and the right-hand side is the 
(i, k) element of PC. Hence 

AQ = PC, 

A = PCQr 1 , 

or, in a form which reminds us of the bases used to obtain each matrix, 

Aa.fi = P y ,yC y . 

Theorem 6.17. Two m X n matrices A and C represent the same linear 
transformation from D m to relative to two pairs of bases if and only 
if A and C are equivalent. 

proof: Our previous discussion has shown that if A and C represent 
the same linear transformation, then 

A = PCQ - 1 

for some nonsingular m X m matrix P and some nonsingular n X n 
matrix Q. Hence A and C are equivalent. Conversely, if A and C are 
equivalent, then for suitable nonsingular matrices P, Q 

A = PCQ -1 . 

Choose any basis pair, { 7 } for T5 m and {5} for c W n , and let T be the linear 
transformation represented by C relative to this choice of bases. Let R 
be the linear transformation on V m defined by the matrix P relative to 
the 7 -basis, and let S be the linear transformation on e W n defined by the 
matrix Q relative to the 5-basis. 

Let a, = 7 ,R and 0, = 5yS. Since PC = AQ we can reverse the order 
of the calculations which follow Figure 6.1 to obtain 

a»T = L (H PvCjk) * X ( L h = X 
fc-iv-l / Jfc -1 \>-i / J -1 

Hence -4 represents T relative to the a, 0 pair of bases. 

Now let us reverse the roles of matrices and linear transformations in this 
discussion. Suppose we have a single m X n matrix A and two pairs of 
bases, a, 0 and 7 , 5. Relative to each basis pair, A determines a linear trans- 
formation, say Ti and T 2 . How are these transformations related? 

As before, we let R and S be the linear transformations defined by y»R = a, 
and 6jS = ft. We have 
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Hence 


Therefore 


7.T 2 = L « .A, 


aX = £ a„ft. 


7.RT1 = a.T] = £ a, A = £ 

;-l ;-l 


7T2S. 


RTi = T 2 S, 

Tj = RT 2 S 


Conversely, suppose that there exist nonsingular transformations R on U m 
and S on e W„ such that Ti = R _l T 2 S. Choose a pair of bases { 7 }, (5) for U m 
and Wn, and define a t = 7 .R, 0 } = 6 , 8 . Then [a] , {(S\ forms a pair of bases 
also. Let A be the matrix which represents T 2 relative to the 7 - and 6 -bases. 
Then 

7 JRT 1 = a t T, = 7 ,T 2 S - S = 5Z 

so A represents Ti relative to the a- and /3-bases. Now the pictorial scheme is 
especially helpful: 


•u m = [a] -3- W. = [0] 

Aa.fi 


V n = [ 7 ] V- w » = w 

I2 

Figure 6.2 

Ti has the same effect on Um as a change of coordinates in T) m (R"" 1 indicated 
by going against the arrow), followed by T 2 , followed by a change of co- 
ordinates S in W„. Notice that in Figure 6 . L a similar interpretation can 
be made only by considering the vertical arrows as being reversed. We have 
proved the following analogue of Theorem 6.17. 

Theorem 6.18. Two linear transformations T t and T 2 from to W n 
are represented relative to two pairs of bases by the same matrix if and 
only if nonsingular linear transformations R on V m and S on W n exist 
such that T 2 = RTiS -1 . 

We now specialize our consideration to linear transformations of an n- 
dimensional space into itself; many of the important transformations of 
mathematics and physics are of this type. The matrices which represent 
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such transformations will be square, say n X n, and our work of the next 
four chapters will principally concern square matrices. 

For the present discussion, if V m = *W n we can take the a - and £-bases to 
be the same, and the 7 - and 5-bases to be the same. In this case the transfor- 
mations R and S are equal, and the relation between the two matrices A and C 
which represent the same linear transformation relative to the a-basis and the 
7 -basis, respectively, is 

A a = PyCyPy\ 

where P represents relative to the 7 -basis the transformation which changes 
bases from 7 to a. This relation between matrices exists only for square 
matrices and is a special type of equivalence wherein Q = P~ l . 

Definition 6.4. Two n X n matrices A and C are said to be similar if 
and only if 

A = PCP~ l 

for some nonsingular matrix P. 

Theorem 6.19. Two n X n matrices A and C are similar if and only if 
there exist two bases {a} and { 7 } for V n and a linear transformation T 
on V n such that A represents T relative to the a-basis and C represents T 
relative to the 7 -basis. 

proof: This theorem may be regarded as a special case of Theorem 
6.17 in which only one space is involved, and therefore only two bases 
rather than two pairs of bases are required. In the proof of Theorem 6.17 
we let m = n, V m = Wn, 0, = a„ and 6, = 7 ,. Then S = R, Q = P, 
and the conclusions follow. 

Theorem 6.20. Two linear transformations Ti and T 2 on are repre- 
sented relative to two bases for T) n by the same matrix if and only if a 
nonsingular transformation R on V n exists such that T 2 = RTiR -1 . 

proof: Exercise. 

It is important to distinguish clearly between the relations of equivalence 
and similarity. To begin with, equivalence is defined for m X n matrices, 
while similarity is defined only for n X n matrices. But in obtaining Theorems 
6.19 and 6.20 as special cases of Theorems 6.17 and 6.18, the specialization 
occurred not only by choosing m = n but also in the selection of bases. Al- 
though both equivalence and similarity are defined for n X n matrices, they 
are different relations; similar matrices are equivalent, but equivalent n X n 
matrices are not necessarily similar. 

The notion of similarity is particularly important because of Theorem 6.19. 
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Since similar matrices represent the same linear transformation, they must 
share all properties of the transformation which are independent of any 
coordinate system. These are the intrinsic geometric properties of the trans- 
formation. A problem of special interest is to find a simple canonical form 
for every similarity class; this problem is the same as that of selecting a 
coordinate system in which a given linear transformation assumes a simple 
form which is determined by intrinsic geometric properties. The next chapter 
is concerned with its solution. 

Exercises 

1. Prove that similarity of matrices is an equivalence relation. 

2. Prove that if A and B are similar, then 

(i) P (A) = P (B), 

(ii) det A = det B. 

3. Are AB and BA similar for all n X n matrices 0 What can be said if 
either A or B is nonsingular? 

4. In the analysis of three-phase power systems, an impedance matrix 
often occurs in the form 


r 

22 

*\ 

c = u 

2 i 

e 2 , 

V2 

23 

* 1 / 

where z ; is a complex number for j = 

1, 2, 3. Let 

/I 

1 

1\ 

P = (l 

e 

c * j» 

\1 

e 2 

e 


where e = J(— 1 + fV3). Show that C is similar to a diagonal matrix D 
and compute D. (Observe that e 3 = 1, so that e (i) 2 + e + 1 =0.) 

5. Let T be the linear transformation on C 3 whose matrix relative to the 
{«i, « 2 i € 3 } basis is 



(i) Show that the vectors 71 = ( 1 , 1,0), 72 = (1,0, 1 ), and 73 = 
( 1 , — 1 , 1 ) are linearly independent and hence form a basis. 

(ii) If R is the linear mapping defined by 7 ,R = e„ i = 1 , 2, 3, show 
that R is represented relative to the 7 -basis by the matrix 

1 -1 1 
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(iii) Calculate the matrix C which represents T relative to the 7-basis 
by two methods: first, express y,T in terms of the y’e for t — 1, 2, 3; then 
compute P~ l AP. 

6. Let T be a linear transformation of S 2 into £3, whose matrix relative to 
the bases {«i, e 2 } and {*i, «*, e 3 } is 



Let new bases be defined by 

fa, = *1 — 2e s , 

\ c *2 = «1 + « 2 , 

and 

ft = «i + *2, 

- /S2 — *2 + «3, 

^3 = <1 + «3- 

Compute the matrix which represents T relative to the a- and /3-bases. 

7. (i) Show that any two idempotent matrices of the same dimension 
and rank are similar. 

(ii) Describe a form for idempotent matrices which is canonical with 
respect to similarity. 

8. If A and B are similar, determine whether or not each of the following 


pairs are similar: 



(i) 

and 

B k , 

(ii) A' 

and 

B', 

(iii) A~ l 

and 

B~ l , assuming A is nonsingular. 


9. Prove Theorem 6.20 in detail. 



CHAPTER 7 


A Canonical Form 
for Similarity 


For the remainder of this hook we shall restrict our investigation in two 
ways, sometimes by necessity and sometimes for convenience. 

We shall consider only square matrices, unless otherwise noted ; thus linear 
transformations will be regarded as mapping a space into itself. 

We shall assume that the scalar field is either the real or complex numbers. 
By this time you should have little difficulty in discerning which theorems 
can be extended beyond the limits imposed by these restrictions. 


§ 7 . 1 . Characteristic Vectors and Values 

The general problem which we undertake in this chapter was stated at the 
end of § 0.5. A linear transformation T on U n may be regarded as a rearrange- 
ment of the points of the space, without reference to particular coordinate 
systems. Indeed, those properties which distinguish T intrinsically must hold 
in any coordinate system and hence must be invariant under a change of 
coordinates. Since equal transformations are represented by similar matrices, 
our investigation will make heavy use of similarity, with a major objective 
being the derivation of a canonical form for similarity. Another problem which 
we shall solve in this chapter is to determine under what conditions a given 
matrix is similar to a diagonal matrix. Since calculations with diagonal 
matrices are quite easy, the relation of this problem to the simple representa- 
tion of a linear transformation is apparent. 

When we regard T as a rearrangement of the vectors of *1), it is natural to 
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look for vectors which are mapped by T in some simple way. The null space, 
for example, is the set of vectors mapped into 6 . Also, we might look for a 
fixed point — a vector which is mapped into itself. More generally, we search 
for any vector which is mapped by T into a scalar multiple of itself : 

£T = X£ for some scalar X. 

(The use of the Greek letter X for a scalar is an exception to the notation 
adopted for this book. It is used for characteristic values to conform with 
generally accepted notation.) Clearly, for any T, 8 is such a vector (indeed, 
a fixed point), so we are interested only in nonzero vectors which have this 
property. 

For example, consider the linear transformation T defined on C 3 by the 
matrix 



The point (a, 6, c ) is mapped by T into (a + 26 + c, 2 a — 2 c, — a + 26 + 3 c). 
In particular, for any value of a, (a, 0 , a)T = 2(a, 0, a) and (a, —a, a)T = 
0 (a, —a, a). Hence the vectors y 2 and 73 of the new basis described in Exer- 
cise 5 , § 6 . 5 , are mapped by T into scalar multiples of themselves: y 2 T = 2 y 2 
and 7 sT = O73. The vector 71 is mapped by T in a manner which is only 
slightly more complicated: 71T = 271 + 7 2 . This geometric simplicity makes 
the 7-basis a natural (although oblique) coordinate system for representing T 



Figure 7.1 


(Figure 7 . 1 ), and we have* seen that this geometric simplicity is reflected in 
the algebraic representation of T relative to the 7-basis: 
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The work which ensues will show that no simpler representation of T is 
possible. 

Definition 7.1. A nonzero vector £ such that 
£T = X£ for some scalar X 

is called a characteristic vector of T. The scalar X is called the characteristic 
value of T which is associated with the characteristic vector £. The set 
of all characteristic values of T is called the spectrum of T. 

In the literature of matrices there is a wide variety of synonyms for charac- 
teristic vectors (eigenvectors, proper vectors, proper states) and for character- 
istic values (eigenvalues, proper values, characteristic numbers, characteristic 
roots, latent roots). 

Suppose that T is represented by A = ( a u ) and £ is represented by the row 
vector X = (xi . . . x n ), relative to a fixed basis. If £ is a characteristic vector 
associated with the characteristic value X, then 

rr = x£, 

XA = XX. 

X(A - X7) = Z. 

This is the matrix form of a system of n linear homogeneous equations, and 
by Theorem 5.1 a nonzero solution X exists if and only if A — X/ is singular. 
This occurs if and only if 

det(A - X/) = 0. 

From our knowledge of determinants we see that 

|dn — X an ... di n 

&2i or 22 X . . . a<i n 

det(A — X/) = =0 


d n 2 X; 

is a polynomial equation of degree n in X, say 

(— l) n X n + fcjX- 1 + • • • + &n-iX + b n « 0, 
where the b ’ s are sums of products of the a l} . If the scalar field is the field 
of real or complex numbers, we know by the fundamental theorem of algebra 
and its corollaries that there are exactly n complex numbers X (not necessarily 
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distinct from one another) which satisfy this equation. In order that we may 
be sure that .the characterist ic values X are in the scalar field of V 9 we shall 
make the simplifying assumption during the remainder of this chapter that 
SF is the field of complex numbers. 

Definition 7.2. The polynomial det(A — X/) is called the characteristic 
polynomial of the matrix A. The equation det(A — X/) = 0 is called 
the characteristic equation of A . 

Definition 7.3. The characteristic values of a matrix A are the roots of 
the characteristic equation of A . 

Our discussion has established the following result. 

Theorem 7.1. The characteristic values of a matrix A are the charac- 
teristic values of the linear transformation represented by A in any 
coordinate system. 

Theorem 7.2. If A and B are similar, then A and B have the same 
characteristic polynomial and hence the same characteristic values. 

proof : If A = PBP~\ then 

A - X/ = PBP - X/ = P(B - \I)P-\ 
det(/l - X/) = (det P)dot(B - X/)( det />-*) = det (B - X/). 

Theorem 7.3. The characteristic values of a triangular matrix are the 
diagonal elements. Thus the characteristic values of a diagonal matrix 
arc the diagonal elements. 

proof : Exercise. 

So far we have found out how to determine the characteristic values 
Xi, . . . , X„ of A. To determine characteristic vectors associated with the 
value X„ we solve for X the matrix equation 

X(A - X,/) = Z. 

Suppose that X, is the matrix representation of a characteristic vector asso- 
ciated with X t . Then for any scalar c, cX l is also a characteristic vector 
associated with X t . Similarly, if Y , is a characteristic vector associated with 
X*, then (X, + T,) is also a characteristic vector associated w r ith X». 

Theorem 7.4. Let X be a characteristic value of the linear mapping T. 
The set of all characteristic vectors of T associated with X, together 
with the zero vector, form a subspace of *0, and $T e <3x for every 

$ ee x . 
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Exercises 


1. Find the characteristic polynomial, the characteristic values, and the 
characteristic vectors of each of the following matrices: 


(i) 


(ii) 


(in) 



0 0 0 \ 

1 2 0 | 
' ’ o i or 
1 o 1/ 


2. Prove that if X is a row vector and D a diagonal matrix such that 
XD 2 = Z, then X. D = Z. 

3. Prove Theorem 7.3. 

4. Verify in detail the proof of Theorem 7.4. 

5. Referring to Theorem 7.4, suppose that we choose any basis for <S\ and 
extend it to a basis for V. Describe the matrix which represents T relative 
to that basis. 

G. Prove that if A is nonsingular then the characteristic values of .1 ~ 1 arc 
the reciprocals of the characteristic values of A. What can be said about the 
corresponding characteristic vectors? 

7. Show that if X is a characteristic vector of A associated with the 
value X, then for any natural number k, X is a characteristic vector of A k 
associated with the characteristic value X*. 

8. If Xi, . . . , X„ arc the characteristic values of A, show that 


det A = XiX 2 • • • X„ 


by relating each side to the constant term of the characteristic polynomial 
of A. 

9. Prove that if S = {&, . . . , $*} is a set of characteristic vectors of T 
associated respectively with distinct characteristic values Xj, . . . , X*, then 
S is linearly independent. 

10. A Markov matrix was defined in Exercise 8, § 4.4. Prove that every 
characteristic value of a Markov matrix satisfies |X| < 1. 

11. Prove that X = 1 is a characteristic value of every Markov matrix. 
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§7.2. A Method of Diagonalization 

As a start on the problem of determining what matrices are similar to a 
diagonal matrix, the next theorem gives us a sufficient condition, which is 
later proved to be necessary also. In addition, we obtain a method of diag- 
onalizing A ; that is, we find P such that PAP~ l is diagonal. 

Theorem 7.5. Let X t = (x,i, . . . , x tn ) be a characteristic vector of A 

associated with X„ i = 1, 2, . . . , n. If the vectors X , span *0, then the 

matrix P = (. x l} ) is such that 

0 . . 0 \ 

/ 0 x 2 . . 0 \ 

PAP~ l = 1 • • I = diag(Xi, . . . , X n ) = D. 

\o 0 . . 

p r o o f : Notice that the row vectors of P are characteristic vectors X,. 
If these n vectors span V, they are linearly independent, so P is non- 
singular. We have 

A\A = X t X„ 
so 

Tl 

Y* XikQkj = X t Jjj 
1 

ior j = 1, 2, . . . , n. Furthermore, 

PA = (O, 

n 

where c lJ = Y Zika kJ = X t r„, and 

*-i 

DP = (&„), 

where b tJ = X t j,j. (You should verify these calculations.) Hence 
PA = DP. 

Example 

We shall find the characteristic values, characteristic vectors, and a diag- 
onalizing matrix P for the matrix 



The characteristic equation of A is 
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0 = det(A — X/) = 


1 - X 0 -2 

0 -X 0 

-2 0 4 — X| 

= (1 - X)(-X)(4 - X) - ( — 2)( — X)( — 2) 

= -X 3 + 5X*. 


Hence the characteristic values of A are X] = 0, X 2 = 0, X 3 = f>. 
Let X = (jci, x 2 , Xa). Then 

XA = (ji - 2x 3 , 0, — 2 ji + 4x 3 ) 


and 


XA r = (Xxj, Xx 2 , Xx 3 ). 

Necessary and sufficient conditions that -V be a characteristic vector asso- 
ciated with X are therefore 


Ti — 2x 3 = Xxj, 

0 = Xx 2 , 

— 2xi + 4x„ = Xx 3 . 

For X = 0 these reduce to 

T\ = 2x 3 . 

Hence any vector of the form (2c, b , c) is characteristic. Two such vectors 
which are linearly independent are 


Xi = (2, 0, 1), 

x 2 = ( 0 , 1, 0 ). 


Notice in this case that it is possible to select two linearly independent char- 
acteristic vectors both of which are associated with the same characteristic 
value. For X 3 = 5 the conditions reduce to 


-2xi = X 3 , 
x 2 = 0. 

Hence any vector of the form (a, 0, —2a) is characteristic. As a simple vec- 
tor of this form we choose 

X 3 = (1,0, -2) 

as a characteristic vector associated with X 3 = 5. Then 

/ 2 0 

P = (0 1 0 

\l o - 2 ) 

and det P = — 5, which checks the linear independence of X h X 2 , and XV 
You should check the calculations which show that 
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1-2 0 -1 

P- 1 — i( o —5 0 

\-l 0 2 

and 

/0 0 0\ 

PAP- 1 0 0 OJ. 

\0 0 5/ 

The geometric interpretation of these calculations is that in £ 3 each point y 
on the line determined by the origin and = (1,0, —2) is mapped by T into 
5ij, while each point f on the plane determined by the origin, & = (2, 0, 1), 
and £v = (0, 1,0) is mapped by T into 0*f = 0. The three characteristic 
vectors £j, f 2 , and £ :l are linearly independent and may be chosen as a basis 

for V 3 . Relative to that basis, T is 
represented by the diagonal matrix 
D = diag(0, 0, 5). 

When we attempt to diagonalize 
a given matrix A by the preceding 
method there are three possible 
situations: 

If the characteristic values are all 
distinct, then there exist n linearly 
independent characteristic vectors; 
thus A can be diagonalized. (Proved 
later.) 

If the characteristic values are not distinct, as in the preceding example, 
it might still be possible to find n linearly independent characteristic vectors 
so that .4 can be diagonalized. 

If the characteristic values are not distinct, it can happen that no set of 
n linearly independent characteristic vectors exists. Then A cannot be 
diagonalized. 

These three cases are illustrated in Exercise 1, below. 


z 



Exercises 

1. For each of the follow ing matrices determine the characteristic values 
and corresjx)nding characteristic vectors; if the matrix is similar to a diagonal 
matrix, find P and show' that PAP- 1 is diagonal: 
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/2 -2 

3 \ 


1 

1 

-1 

CO 

l 1 1 


(iv) 

0 

0 

1 


\l 3 ■ 

-1/ 


0 

-2 

-3 


7 4 

-1 


1 

0 

0 

(ii) 

1 4 7 

-1) 

(v) 

3 

-2 

G 


-4 -4 

4 


0 

0 

1 


2 -2 

3\ 





(iii) 

10 -4 







5 -4 

6^ 






2. Which matrices of Exercise 1, § 7.1, arc similar to a diagonal matrix? 
Without computing P, write the diagonal form of each such matrix. 

3. Prove that a transformation is nonsingular if and only if none of its 
characteristic values are zero. 

4. Prove that a transformation is nilpotent if and only if all of its charac- 
teristic values are zero. Show by a particular example that part of the pre- 
ceding statement is false if one is restricted to work within the field of real 
numbers. 

5. (i) Without expanding a 4 X 4 determinant show that a — b, a — c, 
and a — d are characteristic values of the matrix 



(ii) What is the fourth characteristic value? 

(iii) Find general conditions on the numbers a, b, c, d which are suffi- 
cient that A be similar to a diagonal matrix. 

G. Prove that AB and BA have the same characteristic values by estab- 
lishing the following assertions. 

(i) If D = FAQ is the canonical matrix equivalent to A (Theo- 
rem 6.14) then DC and CD have the same characteristic equation for all C. 

(ii) PA BP - 1 and QBAQr 1 have the same characteristic equation if 
P and Q are as given in (i). 

(iii) AB and BA have the same characteristic equation. 

7. Prove that if either A or B is nonsingular, then AB and BA are similar. 

8. Give an example of two matrices which have the same characteristic 
equation but are not similar. 

9. Prove Theorem 7.5 by means of a geometric argument. 
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§7.3. Minimal Polynomial of a Matrix 

In Theorem 4.2 and subsequent remarks it was proved that the set of all 
n X n matrices over a field forms a linear algebra of dimension n 2 . Hence 
in this algebra the n 2 + 1 elements 

1 = A\A y A\. . . , A n ' 

are linearly dependent for any n X n matrix A. By Theorem 2.8, a set is 
dependent only if some “vector” of the set is a linear combination of the 
ones which precede if. Let rn be the integer such that (/, A, . . . , A 7 *" 1 ) is 
independent but (/, A, . . . , A m ) is dependent. For suitable scalars we have 

<zoA m + aiA m_1 + • ■ • + a m -iA + a m I = Z, 
where ao ^ 0. Hence for bt = a k a ( f 1 we have 

A m + b\A m ~ l + • ■ ■ + b m A + b m I = Z. 

This is a polynomial equation in the matrix A. The corresponding scalar 
polynomial is 

M (x) = x m + lhx m ~ l + • • ■ + b m . 

A polynomial, such as M, which has 1 as the coefficient of its largest power 
is called moidc. 

Definition 7.4. The minimal polynomial of the matrix A is the monic 
scalar polynomial 

M (x) = x m + aix m ~ l + • • • + a m 
of least degree such that 

A m + aiA m ~ l + • ■ • + a m I = Z. 

We have seen that each matrix has a minimal polynomial, and as an exer- 
cise you may prove that the minimal polynomial of A is unique. A word of 
caution is justified at this point. The form of Definition 7.4 tempts us to 
define the minimal polynomial of A as “the monic polynomial of least degree 
for which A is a zero.” However, we must distinguish between matrix poly- 
nomials and scalar polynomials. Since the properties of matrices and scalars 
are quite different, there is no a priori justification for substituting the 
nyatrix A for the scalar x. But every scalar polynomial determines a matrix 
polynomial which is formed by replacing the scalar x by the matrix X and 
each scalar coefficient a, by the scalar matrix aj. In this sense it is correct 
to say that the minimal polynomial of A is the monic polynomial M of 
least degree and with scalar matrices as coefficients such that M(A) = Z. 

The difficulties that arise in handling polynomials having matrices as 
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coefficients are due principally to the existence of zero divisors and the 
lack of commutativity. For example, the matrix polynomial equation 

(. X - A)(X - B) - Z 

does not imply that X = A or X = B } because a product of nonzero matrices 
may be zero. Also, the matrix polynomial 

X 2 — AX — BX + AB 

cannot be factored as 

(X - A)(X - B) 

unless it is known that X and B commute. On the other hand, we know 
that any scalar matrix B = bl commutes with any matrix, so that any 
factorization of a scalar polynomial 

x m + aix"- 1 + • • • + a m » (x — n) • • • (x - r m ) 
determines^ like factorization of the corresponding matrix polynomial 
X- + a,X— 1 + • • • + ami = (X - rj) • • • (X - r m I). 

A remarkable result concerning matrix polynomials is the Hamilton-Cayley 
theorem which may be stated loosely as “Every square matrix satisfies its 
characteristic equation.” To see that this is so, consider a square matrix A 
and assume for the present that its characteristic vectors span U. Let the 
characteristic polynomial of A be 

( — l) n A w + bi\ n ~ l + ••• +b n . 

Let B be the matrix defined by 

B = (- 1 )M" + biA n ~ l + • • - + bj. 

If X is a characteristic vector of A associated with the characteristic value X, 
then by direct calculation using Exercise 7, § 7.1, we have 

XB = [(-1)"X* + 6A*- 1 + • • • + b n ]X = 0 X = Z. 

The transformation S associated with B maps every characteristic vector of 
A into zero. By assumption, the characteristic vectors of A span U, so S is 
the zero transformation, and B must be the zero matrix. In the next section 
we shall prove the same result for any square matrix. 

The Hamilton-Cayley theorem shows that the degree of the minimal poly- 
nomial of A does not exceed n. It can be proved that the minimal polynomial 
of A divides any polynomial p such that p(A) = Z; hence the minimal 
polynomial divides the characteristic polynomial. 

Exercises 

1. Show that the minimal polynomial of A is unique. 

2. Prove that a matrix and its transpose have the same characteristic 
polynomial. 
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3. Prove the Hamilton-Cayley theorem for 2 X 2 matrices by direct cal- 
culations with matrices. 

4. Prove that if A is similar to B and if p{X) is any matrix polynomial 
with scalar coefficients, then p(A) is similar to p(B). 

5. Prove that if A is similar to the scalar matrix cl y then A = cL 

0. (i) Determine all real 2X2 matrices A which satisfy A 2 = —7. 

(ii) Show that no real 3X3 matrix A satisfies A 2 = —7. 

7. (i) Let C be a square matrix with 1 in every position on the super- 
diagonal (c lll+ i = 1), c n u c nn in the last row, and zeros elsewhere. Show 
that the characteristic polynomial of C is 

(- l)”[X n - C nn \ n ~ l - Cn.n-iX"- 2 - ... - C n2 X - c„l]. 

(ii) Deduce that every polynomial whose leading coefficient is ±1, 
according to whether its degree is even or odd, is the characteristic poly- 
nomial of some matrix. 

8. Use the Hamilton-Cayley theorem to solve Exercise 8, § 5.4. 


§ 7 . 4 . Invariant Subspaces 

We have previously observed that the problem of finding a simple standard 
form for each similarity class of n X n matrices is equivalent to selecting 
a coordinate system which takes full advantage of the intrinsic geometric 
properties of the linear transformation which is represented by each matrix 
of that class. One very useful geometric property, which we first met as a 
special case in Exercise 8, § 3.2, is the existence of a subspace 3TC, which is 
mapped into itself by T. Clearly the range and null spaces of T are such 
spaces, as is the space C* of Theorem 7.4. 

Definition 7.5. Let T be a linear transformation on V. A subspace 9(11 of 
*0 is said to be invariant under T (or T-invariant) if and only if £T e 9TI 
for every £ G 9TC. 

A T-invariant subspace therefore has the property that each vector of the 
subspace is mapped by T into a vector in the same subspace. A useful nota- 
tion is to let 31ZT denote the set of all images of the vectors of 9TC. Then a 
space is invariant under T if and only if 3TCT C 3TI. If 311 is a T-invariant 
subspace of 1), it is possible to study the behavior of 911 under T, ignoring 
the effect of T on the rest of U, since vectors of 911 are mapped into vectors 
of 911. The transformation T on V thus defines a transformation Tga on 911, 
called T restricted to 311. 
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The matrix interpretation of this notion is illuminating. Let {art, . . . , a k } 
be a basis for the invariant space 3TI, and let {e*i, . . . , a n J be a basis for *U. 
Then a t T = a„a, for i = 1, . . . , k, and the matrix of T in this new 
basis is of the block form 



where A i is a k X k matrix. 

In the event that V = 3TI © 31 where 311 and 31 are both invariant under T, 
a basis for 311 together w'ith a basis for 31 form a basis for *0, and relative to 
that basis T is represented by a matrix of the form 



where A\ is a square array of dimension equal to the dimension of 311 and 
A* is a square array of dimension equal to the dimension of 31. 

Actually there is an abundance of T-invariant subspaces of a special type, 
called cyclic subspaces, which also arose in Exercise 8, § 3.2. Let T be a 
linear transformation on a finite-dimensional space T), and let £ be any 
nonzero vector of V. We consider the vectors £, £T, £T 2 , .... By Theo- 
rem 2.8 for some value of k, {£, £T, . . . , £T* -1 } is linearly independent but 
£T* G R, £T, . . . , Hence 

£T* = £ Cl £T'- 1 

i-i 

for suitable c,. But this implies that the space [£, £T, . . . , £T* _1 ] is T-invariant, 
since 

( l a,r r-‘) T = *£ a,rr i + a*£T\ 

Definition 7.6. Let T be any linear transformation on 1), £ any nonzero 
vector of V, and k the largest positive integer for which {£, £T, . . . , fT*" 1 } 
is linearly independent. The space [£, £T, . . . , fT*” 1 ] is called the 
T -cyclic subspace generated by £, and the basis {£, £T, . . . , £T*“ ! } for 
that subspace is denoted {(£)t} and called the T -cyclic basis generated 
by £. More generally, any linearly independent set of the form 
{(£i)t, (£ 2 )t, . . . , (£r) t} is called a T -cyclic basis for the space which 
it spans. 

Again the matrix interpretation is useful. Let T be a linear transformation 
on T) n> and let S denote the cyclic subspace, of dimension fc, generated by a 
vector £. The transformation T$ is represented relative to the basis 
{(, (T, . . . , {T*“ x } by the k X k matrix 
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1 0 

0 1 

I 

(7.1) 

I 

0 0 ••• 1 

C2 C& • • • Ct 

which has Ci, . . . , c* in the last row, 1 in each superdiagonal position, and 
0 elsewhere. A matrix of this form was studied in Exercise 7, § 7.3, where 
you were asked to show that the characteristic polynomial of Ci is 
l\(x) = (-l)*(;r* — c k x k ~ l — ... — ci). 

Ci is called the companion matrix of the polynomial Pi. 

It can be proved that “Un is a direct sum of cyclic subspaces, and hence 
that T can be represented by a matrix of diagonal block form 

Ci Z 

Z C 2 

z z 

where each C, is the companion matrix of a particular polynomial which 
divides the minimal polynomial of T. Also the product of these polynomials 
is the characteristic polynomial of T. This form is canonical for the similarity 
class of matrices which represent T, and is called the rational canonical form 
(or sometimes the Jordan canonical form, a term that we reserve for the 
form derived in § 7.7). For further information you may consult the references 
given at the end of this book. 

We are now ready to prove the Hamilton-Cayley theorem. 

Theorem 7,6. (Hamilton-Cayley.) If A is a matrix with characteristic 
equation 

(-1)"A* + 6iX"- 1 +•■•+&»- 0, 

then 

(-1 )»A* + biA»~' + ••• +6J = Z. 

proof: Given annXn matrix A, choose any basis for V n and let T 
be the linear transformation represented by A relative to that basis. 
Let P be the characteristic polynomial of A, 

POO = det(A - X/) = (-l) n (X n + + ■ • • + W, 

and consider the transformation P(T) defined by 

PCD - (-l)-(T* + &1T"- 1 + • • • + b n I). 

We shall show that JP(T) = 0 for every £ G D n . Since P(T) is repre- 
sented by the matrix P(A), we can then conclude that P(A) =* Z. 
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Let £ be any nonzero vector of V n , and let 3 be the cyclic sub- 
space generated by £, say of dimension k. Then relative to the basis 
{£, £T, . . . , fT* -1 } for S, the transformation Tg is represented by the 
matrix C\ given by (7.1) above. The characteristic polynomial Qi of C\ 
is given by 

Q,(X) = det(Ci - X7) = (-l)HX* - c*X*-> ci), 

where the coefficients c x are obtained from the relation 
= Cl £ + c 2 £T + • • • + c k ^-\ 

Since T and T& coincide on S, we combine the last two equations to 
obtain 

0 = £T§ — (c*£Ts _1 + • • • + c 2 £T s + Ci£) 

= £[It - WR - 1 + * • • + C,I] 

= (-l)Wi(Ts). 

Now extend this basis for vS to a basis for *U n . Then T is represented by 
a matrix C, which is similar to A and of the form 



Then 

POO = det(A - X/) = det(C - X/) 

= det(Ci - X/)det(C 4 - X7) 

- Qi(x)Q 4 (x) f 

where Q 4 is the characteristic polynomial of C 4 . Hence P(T$) =- 
Qi(Tg)Q 4 (Tg). But £ G S, so 

£P(T) = £P(T S ) = £Q 1 (Tg)Q 4 (Tg) = BQ 4 ( Tg) = Q, 
as we wished to prove. 

A stronger form of invariance occurs in connection with familiar mappings 
called projections which are also intimately related to direct sums. A projec- 
tion E is defined simply as any idempotent linear transformation: 

E 2 = E. 

In Exercise 5, § 3.6, it was shown that any idempotent linear mapping is the 
identity transformation on its range space (Re. Hence not only is (Re invariant 
under E as a subspace, but each point of (Re is invariant: £E = £ for every 
£ G (R e . Furthermore, (Re H 31e = [0], so V = (Re © 31e. Hence any projec- 
tion E on *0 decomposes V into the direct sum of two subspaces; E is the 
identity mapping on one of these subspaces and the zero mapping on the other. 

Conversely, if *0 = 3111 © 3R 2 , each £ € V has a unique expression 
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{ » fi x + where m e 3tli and m e 9TCj. The mapping T defined by 
*T = fxi is linear and idempotent, and hence a projection, called the projec- 
tion of 13 on SJTCi along 3 H 2 . Clearly 9TZi = (Rt and 3TC 2 = 31 t. 

Projections have many important properties, some of which are developed 
in Exercises 3 to 6 of this section. Although we shall not make systematic 
use of projections, we shall see in the next section that they are closely 
related to the diagonability problem for matrices. You are urged to study 
these exercises and to prove the results stated therein. 

Exercises 

1. Refer to the linear transformation T 3 of Example (c), § 3.1. 

(i) Find two invariant subspaces such that 8 2 is the direct sum of 
these subspaces. 

(ii) Write the matrix A for T 3 relative to the {ei, e 2 } basis. 

(iii) Write the matrix B for T 3 relative to a basis {ft, ft}, where each 

ft spans one of the invariant subspaces of (i). 

(iv) Find a matrix P for which 

B = PAP-K 

2. Prove that if S and (ft are T-invariant subspaces, then so are S D (ft 
and S -f- (ft. 

3. Determine the possible characteristic values of a projection, and de- 
scribe a simple matrix representation of a projection. (See Exercise 5, § 3.7.) 

4. Let *13 = 3TCi © 3tl 2 , and let Ei and T be linear mappings of 13. Prove 

the following theorems. 

(i) Ei is the projection on 3TCi along 3Tl 2 if and only if I — Ei is the 
projection on 9H 2 along Dili. 

(ii) is T-invariant if and only if EiTEi = EiT, where Ei is the 
projection on 3Hi along 3H 2 . 

(iii) 9Tli and 9Tl 2 are T-invariant if and only if EiT = TEi, where Ei is 
the projection on SfTli along 3Tl 2 . 

5. Let {Ei, . . . , E*} be a set of projections. These projections are called 
orthogonal if and only if E,E, = Z whenever i ^ j } and are called supplementary 
if and only if I = Ei + • • • + E*. Prove the following theorems. 

(i) If Ei, ... , E* are orthogonal, then E x , . . . , E*, I — 2Z«-i E,- are 
orthogonal and supplementary. 

(ii) If Ei, ... , E* are orthogonal and supplementary, then 

13 = (Rei © (Rei © ■ • • © (Re** 

(iii) Let .13 = Si © S 2 © • • • © &, let { = f i + { 2 + • • • + t* where 
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e and let E, be defined for i = 1, 2, . . . , k by £E, = Then 

{Ei, . . . , E*} is a set of orthogonal and supplementary projections. 

(iv) Describe a simple matrix representation for each projection E, 
of (iii). 

6. Let E be the projection of V on c JUi along ffllo ; let F be the projection 
of U on 3li along 9£ 2 . Prove that E + F is a projection if and only if E and F 
are orthogonal, and in that case that E + F is the projection on iWi + 3li 
along SflZ 2 H SJZ 2 . 

§ 7 . 5 . Diagonalization Theorems 

In this section we shall establish a variety of criteria for determining 
whether a given matrix A is similar to a diagonal matrix D. Since the diag- 
onal elements of D are the characteristic values of 1) (hence of ^1), we need 
only know that such a D exists in order to write one, merely by placing the 
characteristic values of A along the diagonal. 

Theorem 7.7. Each of the following conditions is necessary and suffi- 
cient that A be similar to a diagonal matrix. 

(a) There exist n linearly independent characteristic vectors of A. 

(b) For every row vector X and scalar X, if X(A — X/) 2 = Z, then 
X(A - X/) = Z. 

(c) If X 0 is a characteristic row vector corresponding to the charact erist ic 
value X 0 , then there is no row vector V such that Y(A — X 0 /) = AV 

(d) There exists a scalar polynomial P with distinct zeros such that 
P(A) = Z. 

(e) There exist an integer r, distinct scalars a h . . . , a f} and nonzero 
matrices E if ... , E, such that 

X) a,Ej = A, 

±E, = I. 

J-l 

E t Ej = Z if iV j. 

proof : Theorem 7.5 has established that (a) implies that A can be 
diagonalized. We shall show that diagonalization implies (b), (b) im- 
plies (c), (c) implies (d), (d) implies (e), and (e) implies (a), thus com- 
pleting the cycle. 

Diagonalization implies (b): Let D = PAP~ l be diagonal, let Y = XP~ l 
for a given row vector X } and suppose X(A — X/) 2 = Z for some 
scalar X. Then 
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Z = YP(A - X/) 2 = YP(P~'DP - X/)* - K(D - XJ) 2 P. 

Since P is nonsingular, Y(D — X/) 2 = Z, and hence F(Z) — X/) — Z 
since D — X/ is diagonal (Exercise 2, § 7.1). Thus X(A — X/)P“ l — Z, 
so X(A - X/) - Z. 

(b) implies (c) : Let X 0 be a characteristic value and X 0 a corresponding 
characteristic row vector. Then XoA = X 0 X 0 . If a vector Y exists 
such that Y(A — Xo/) = X 0 , then we have 

Y(A - Xo/) 2 = Xo (A - Xo/) = Z, 
so by (b) we have 

Y(A - X 0 /) = Z - X 0 . 

This contradicts the hypothesis that X 0 is a characteristic row vector, 
so no vector Y exists with the stated property. 

(c) implies (d): We prove that the minimal polynomial M of A has dis- 
tinct zeros if (c) is satisfied. Assume that M -has Xo as a repeated 
zero. Then 

M (:r) = f{x)(x - Xo) 2 . 

Since M is minimal, f(A) (A — X 0 /) is a nonzero matrix, and for 
some row vector F 0 , 

Yo[f(A)(A - X 0 /)] ^ Z. 

Let us call this nonzero row vector X 0 . Then 

X 0 (A - Xo/) = Y 0 f(A)(A - Xo/) 2 = YoM(A) = Z 

since M(A) = Z. Hence X 0 is characteristic, corresponding to Xo. 
But then the equation 

Y(A - X 0 /) = Xo 

has the solution Y 0 f(A), which contradicts (c). 

(d) implies (e) : Let P be a scalar polynomial of degree r such that its 
zeros a lf ... ,a r are distinct and such that P(A) = Z. Without loss 
of generality we can assume that no polynomial of degree less than r 
has these properties. We define r other polynomials p t by the relations 

P{x) = (x - a % )pi(x), i = 1, . . . , r. 

Then p,{a 3 ) = 0 if and only if i ^ j. Let the polynomial g be defined 
by 

g{x) = 1 - E [p t (a»)]' l p,(x), 

i-l 

and observe that g is a polynomial of degree less than r such that 
g(a k ) * 0 for k = 1, . . . , r. Hence g(x) = 0 for every x, and each 
coefficient of g is zero. Hence g(B ) = Z for eveiy matrix B. Now 
consider the r matrices 
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Ei = [p,(a,)] -1 p,04), * - 1, . . . , r. 

By our choice of r, Ei ** Z. We have £{-i E< = I. Also, 
P(A) = (A — o,/)p,(A) = Z, 


so 

Thus 


a,p,(A) = Ap,(A). 


t a,E< = £ b»,(o,)]->Ap 1 (A) = A £ £, = A. 

t -1 1-1 t -1 


Finally, if i ^ j , we have 

= cP(A)h(A), 

where c is a scalar and h(A) is the product of the r — 2 matrices 
A — a k I , where k ^ i and k ^ j. Therefore EJEj = Z if i 9 * j. 

(e) implies (a): Assuming (e), we shall show that for an arbitrary vec- 
tor X, XE t is either characteristic or zero: 


{XE t )A = XE t 



X 51 a x EiEj 


j-i 


= a t XEi = a t {XE l ). 


The last equality follows from the fact that the matrices E t are 
idempotent. (See Exercise 4.) In addition, we have 


x = XI = x L #1 = L (XE t ), 

t-i t-i 

so that any vector is a linear combination of characteristic vectors, 
which means that there are n linearly independent characteristic 
vectors. The proof of Theorem 7.7 is now complete. 


Of these five criteria for diagonability the first is perhaps the most useful 
for the problems we are considering. Given an n X n matrix A , we need only 
compute the characteristic vectors and see if there are n of these which are 
linearly independent. This usually requires a considerable amount of com- 
putation. However, if the n characteristic values of A are distinct, the 
diagonalization problem is easily solved. In that event, A is similar to a 
diagonal matrix because the characteristic values are the distinct zeros of the 
characteristic polynomial /, and f(A) = Z by the Hamilton-Cayley theorem. 
Alternatively we could argue that by Exercise 9, § 7.1, n distinct character- 
istic values determine a set of n linearly independent vectors, which guarantees 
diagonability. In case the characteristic values are not all distinct, n linearly 
independent vectors might still exist. This will occur whenever k linearly 
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independent characteristic vectors correspond to each characteristic value 
of multiplicity k. 

Theorem 7.8. A sufficient (but not necessary) condition that A be 
similar to a diagonal matrix is that the characteristic values of A are 
distinct. 

Statement (e) of Theorem 7.7 also deserves special comment. Interpreted 
in terms of linear transformations, it says that a linear mapping T can be 
represented by a diagonal matrix if and only if T is a linear combination of 
a set of orthogonal and supplementary projections (Exercise 5, §7.4). The 
coefficients a, of that, linear combination are the distinct characteristic values 
of T, as we observed in the last section of the proof of Theorem 7.7. Further- 
more, E, is a projection on the space e a , spanned by the characteristic vectors 
associated with a t . The decomposition 

T = £ a t E, 

■ i 

is sometimes called the spectral form of T, and the diagonability problem 
for matrices is equivalent to the problem of characterizing those linear 
mappings that possess a spectral form. We shall return to this problem 
in § 8.0. 


Exercises 


1. Determine whether each of the following matrices is similar to a diag- 
onal matrix. (Exercise 7, § 7.3, may be of assistance.) 


(i) 


(ii) 



(iii) 


(iv) 


0 1 0\ 

0 0 1 ) 

0 -9 6/ 

<0 1 0 

0 0 1 

0 0 0 

v0 -4 4 


2. Show that the following n X n matrices are similar: 


4 = 1 


1 . . . 1 

1 . . . 1 


I. 



0 

0 

1 

1 


1. • • 1, 
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3. Show that the following n X n matrices are similar: 


r 0 1 0 . 

0 0 1 . 


0 0 0 .. 
k l 0 0 . . 


( ci 0 0 ... 0 

0 e 2 0 . . . 0 

0 0 0 . . . c , 


where e h <? 2 , . . . , e n are the n distinct nth roots of unity, 


2t rk , . . 

Ci = cos + t sin 

n 


2 rk 
n 


4. Let Ei, . . . , E r be a set of matrices which have the second and third 
properties of Theorem 7.7 (e). Prove that each E, is idempoient. 

5. Show that if A = x a,E lt where the E ( have the properties listed 
in Theorem 7.7 (c), then each a, is a characteristic value of A. Find a corre- 
sponding characteristic vector. 

6. Give a necessary and sufficient condition for the diagouability of the 
block matrix 



in terms of the diagonability of the square blocks A and D. 


§7.6. Nilpotent Transformations 

Let us summarize the results concerning diagonalization obtained thus far. 
Any square matrix is equivalent to a diagonal matrix, and a matrix is similar 
to a diagonal matrix if*and only if the characteristic vectors span the full 
space. Since similar matrices represent the same transformation, we are 
particularly interested in the second result, and we recall that the diagonal 
entries must be the characteristic values. The next question is this: Suppose 
A cannot be diagonalized by a change of coordinates; how close can we come 
to diagonalizing A? Or, is there some simple matrix form such that every 
matrix is similar to a matrix in this canonical form and such that the form is 
almost diagonal? We shall show that A is similar to a matrix J which is the 
sum of two matrices, J = D + N, where D has zeros everywhere except on 
the main diagonal and N has zeros everywhere except on the superdiagonal. 
Thus N is nilpotent. Furthermore, the diagonal elements of D are the charac- 
teristic values of A, and the superdiagonal elements of N are either 0 or 1. 
An example of a matrix in this form is 
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0 0 

/ 0 Xi 0 0 0 

J = 0 0 X, 0 0 

0 0 0 X 2 1 

0 0 0 0 X* 

To be more precise, let T be a linear mapping on Dn having r distinct 
characteristic values; Xi, . . . , X,. We shall show that there exists a basis 
for T)„, relative to which T is represented by a diagonal block matrix of 
r blocks, 



The size of the block A, is s,, the multiplicity of X, as a characteristic value 
of T; hence $i + s 2 + ■ • • + s r = n. Furthermore, for each i the block At is 
itself a diagonal block matrix of k(i) blocks, 



The blocks can be arranged in nonincreasing order of size; if p i; denotes the 
size of the block B tJ , then for i = 1, 2, . . . , r, 

PjI — P>2 ^ ^ P<A(i)> 

’ Pil + P<2 + * * * + 

For each i = 1, 2, . . . , r, each diagonal element of A , (and hence of each B tJ ) 
is X t . Each superdiagonal element of J which lies within one of the sub- 
blocks B tJ is 1; each superdiagonal element of J which lies between adjacent 
blocks or sub-blocks is 0, as is every other element of J. 

To show the existence of such a basis, it is clear that we shall make frequent 
use of the concepts of direct sum and invariant spaces. Also since A t — \ J 
is a nilpotent matrix, we can anticipate that a detailed study of nilpotent 
transformations will be required. We now undertake such a study. 

From Exercise 8, § 3.2, we recall that if T is nilpotent of index p on V, 
and if fT*” 1 5 * 0, then {£, £T, . . . , {T^ 1 } is linearly independent. In the 
terminology of Definition 7.6, is a basis for the T-iuvariant, T-cyclic 
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subspace generated by ft We now show that V is a direct sum of such spaces, 
proving that result in a form given by H. F. Trotter. 

Theorem 7.9. If T is nilpotent of index p on *0, then there exists a 
T-cyclic basis for V. Moreover, any set of vectors {ft, . . . , ft} having 
the property that 

^ c,£, T*” 1 = 0 only if every c t = 0 


can be extended to a set {ft, . . . , ft, fi, . . . , ft} such that 

= [(£i)t] © • • • © [(£0t] © [({*i)t] © • • * © [(ft)'r]. 

proof: Wc proceed by induction on p. If p = 1, T is the zero mapping 
and any linearly independent set {ft, . . . , ft)- can be extended to a basis 
{ft, . . . , ft, i?i, . . • , 77 *} for U for which all assertions of the theorem 
are valid. For p > 1, wc first observe that T is nilpotent of index p — 1 
on the T-invariant space Let {a h . . . , a,) be any basis for 

and let fco, . . . , a tf ft, . . . , p n -t} be any extension to a basis for U. 
Then the ft must satisfy the special property stated in the theorem, 
because if 

Z M/r*- 1 = 0 , 

1-1 

then 

n — t 

£ b,p, e 

»-l 

since the a’s and 0’s are linearly independent, each b t = 0. In particular, 
any ft used in the extension must be such that ftT*” x ^ 0, and such 
vectors exist in *U since T is nilpotent of index p on *0. 

Let { 1 , £ 2 , . . • , £r be a maximal set of vectors of V satisfying the 
special property 

T 

Z c,ft' T^ 1 = 0 only if every c, = 0. 

*-1 

It is easily verified that [on, . . . , a t , ft, . . . , ft} is linearly independent. 
Furthermore, D = [«i, . . . , a h £ 1 , . . . , £ r ], for otherwise wc could extend 
that set to a basis {<* 1 , . . . , a<, ft, . . . , ft, £,+ 1 , . . . , ft} for V. But then 
{£n • •■,£*} would satisfy the special property, contradicting the max- 
imality of {£ 1 , . . . , ft}. 

Now let ri % = £,T. Then e and 
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£ CiTH Tv ~ 2 *= 8 only if every c x = 0. 

• - 1 

By the induction hypothesis, 9lx*-» has a T-cyclic basis, and moreover the 
set {171 , . . . , rj r } can be extended to a set {in, . . . , rj r , fi, . . . , f f } such 
that {(i?i)t, • . . , (Vr) T, (f i)tj . . . , (fa)x} is a T-cyclic basis for 
Using the vectors of this basis in place of the a’s above, 
we obtain a basis for V. Also since {(j?,)t) U {£*} = {(^)t}, then 

{(£i)x , (ir)x, (fi)x, . . • , (ft)x} is a T-cyclic basis for V, as the 

theorem asserts. 

Thus any nil potent transformation T on can be used to decompose D n 
into a direct sum of a certain number of T-invariant T-cyclic subspaces, 
say [(£,)x], i = 1, 2, . . . , k. If $ 1 T p * = 0 but £ l T p,_1 ^ 0, then [(£,)x] is of 
dimension p„ since [£„ £,T, . . . , £ I T p, ” 1 J l is linearly independent. By suitably 
rearranging the indices, we can arrange these T-cyclic subspaces according 
to their dimensions, in nonincreasing order. Then since at least one such 
subspacc is of order p, we have 

p = Pi > Pi > • • • > p*, 
n = pi + pa + ■ • ■ + p*. 

Our next objective is to show that the positive integers k, pi, . . . , p k are 
uniquely determined by T, even though the vectors £ t are not. To do so, 
write the basis vectors in an array of the form 


{iTp-i^Tp'-*, ,£i 

&T*- 1 , £ 2 T p - 2 , £2 


, £* 

The rows of the array are of nonincreasing length: pi > p 2 > > p*. 

Likewise the columns are of nonincreasing length. The first column is of 
length k , and the second column is of smaller length only if p* = 1. Indeed, 
the length of the second column is simply the number of vectors £, for which 
p, > 2 (that is, the number of rows of length 2 or more). Similarly the length 
of column j is the number of £, for which p* > j. 

Since £ t T p » = 8 for each z, the first column of the array is a basis for SJlx. 
Hence k = p(T), a number which is uniquely determined by T. In general, 
the vectors of the first j columns form a basis for SRx*. Hence the length of 
column j is simply y(T>) — KT'” 1 ), which is uniquely determined by T. Thus 
T uniquely determines the number k of rows in the array, the length p = pi 
of the first row, and the length of each column. Hence the complete arrange- 
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ment, including the lengths pi, pi, ... ,p L of the rows, is uniquely determined 
by T. 

Theorem 7.10. If T is nilpotent of index p on 13™, then there exist 
uniquely determined positive integers k, pi, . . . , pt such that 

p - Pi > Pi > • • • > Pk 
n = pi + pi + • • • + p k . 

Also there exist vectors & such that [({,)t] is a T-cyclic, 

T-invariant subspace of dimension p, and 

■On = [ft.) T ] © ‘ ' • © [(&) T ]. 

Relative to the corresponding cyclic basis for V„, T is represented by a 
diagonal block matrix 

' Bi 

B, 

■ 

B 

\ 

B k 

where is a p, X p, square matrix with 1 in each superdiagonal posi- 
tion. All other entries of each /i, and of B are zero. 

Theorem 7.11. Any nilpotent matrix is similar to a matrix with a 
certain arrangement of 1 and 0 on the superdiagonal and 0 elsewhere. 
More precisely, the superdiagonal elements consist of several sequences 
of consecutive l\s separated by a single 0, the sequences of l’s being 
of nonincreasing, nonnegative length. 

proof : If A is nilpotent on V n , choose any basis for t)„ and let T be 
the linear transformation represented by A. Then change bases to a 
T-cyclic basis of the form described in Theorem 7.10, obtaining the 
numbers k, p h ... , p* determined by T. The matrix B which repre- 
sents T relative to the new basis is similar to A. Also B is of diagonal 
block form with k square blocks, B h . . . , B k . The dimension of B t is p„ 
and B x has p, — 1 superdiagonal elements, each of them 1. Hence the 
superdiagonal of B is an arrangement of 1 and 0 as follows: 

(11 - - 1 ) 0(11 - - 1)0 - -(11 — 1 ) 0 ( 11 - - - 1 ) 



The sequences of 1 are of nonincreasing length, since pi > P2 > ■ • • > p*; 
in particular, if any p> = 1 , the superdiagonal of B will end with a string 
of consecutive 0 ; s: 
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Theorem 7.12. Let T and S be nilpotent linear transformations on *U n . 
In the notation of Theorem 7.10 let T determine the integers k, 
Pi > p 2 > • • • > Pk] let S determine the integers j, q v > q 2 > • • • > qj. 
Then k = j and p* = q l for i = 1, . < . , k if and only if there exists a 
nonsingular transformation R on V n such that 


S = RTR 1 . 

proof: Suppose k = j and p t = q t for i = 1, . , k. As in Theorem 7.6 

choose as a basis for *U„ the vectors 


{iT, . . . , 
fc, feT, . . . , 


&, feT,...,*!*- 1 . 

Relative to this basis, T is represented by a matrix N whose superdiagonal 
has 1 in the first pi — 1 positions, followed by 0, then 1 in the next P 2 — 1 
positions, followed by 0, and so on. All other entries of N are zero. In 
the same way, choose a basis according to the properties of S. The 
matrix representing S in this basis will also be AT, so by Theorem 6.20, 
S =* RTR“ l for some nonsingular linear transformation R. 

Conversely, suppose S * RTR -1 . Choose as a basis for V n the vectors, 
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m, i?iS, . . . , tjiS ? 1 - 1 
172, 172S, . . . , ^S^" 1 


v>, vfi , . . • , nyS^” 1 , 

as guaranteed by Theorem 7.10. Since R is nonsingular, we obtain a 
new basis by mapping each of these basis vectors by R. But if t > 0, 

Vt S*R = ij.S'-KSR) - ^S'-KRT) » (tj.S^RJT. 

Hence if we let £* = 17, R, the new basis can be written 
ti, fiT, . . . , £^*-1 
6, &T, . . . , 


& fcT, . . . , {/T®' -1 . 

But 

= tj.RTT* 1 = 77,SRT ? ‘“ 1 = 17 ,S 2 RT *- 2 

= ••• = r/.s^R = or = e. 

Hence the £-basis is of the form of Theorem 7.10, and therefore T deter- 
mines the same integers as does S. 

As a consequence of Theorem 7.12 we can show that for nilpotent n X n 
matrices the standard superdiagonal form described in Theorem 7.11 is 
canonical with respect to similarity. If A is nilpotent, relative to a chosen 
basis A represents a nilpotent linear transformation T. Relative to a different 
basis A represents a linear transformation S, where S = RTR~ l . Relative 
to a T-cyclic basis, T is represented by a nilpotent matrix N\ in standard 
superdiagonal form. Relative to an S-cyclic basis, S is represented by jV 2 , 
also in standard form. By Theorem 7.12 S and T determine the same set 
of integers (ft, pi, . . . , p*), so N 1 = N 2 . Hence any nilpotent matrix is similar 
to one and only one matrix in standard superdiagonal form. 


Exercises 

1. Prove that any matrix which is similar to a nilpotent matrix of index p 
is itself nilpotent of index p. 

2. Let 


0 2 IV 

0 0 3] 

,0 0 0/ 


be the matrix of a transformation T on S 3 . 
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(i) Find the rank of A, the null space of A , and the index of nil- 
potency of A. 

(ii) Following the method of Theorem 7.10 and starting with 
= (i ? — 1, 0), find a basis for 83 such that with respect to this basis T is 

represented by a matrix in the canonical form of Theorem 7.11. 

(iii) Calculate this new matrix from the new basis, and find the integers 
kt P i» * • • > P** 

3. Let N be the n X ft matrix which has 1 in each superdiagonal position 
and zeros elsewhere. f 

(i) Prove that A and N commute if and only if A is of the form 


a 1 

a 2 

a 3 

Un 

0 

«i 

CL 2 . . 

a»-i 

0 

0 

a x 

dn~ 2 

^0 

0 

0 

ai 


(ii) If A is of the form above and if a 2 ^ 0, show that the space 
spanned by the characteristic vectors of A is one-dimensional. 

(iii) Let k > 2. If a 2 = a 3 = • • • = a k = 0 and a* fl 5^ 0, show that 
the space spanned by the characteristic vectors of A is fr-dimensional. 

4. Find necessary and sufficient conditions that the matrix C of Exer- 
cise 7, §7.3, be nilpotent. 

5. Prove that if T is nilpotent of index p on then (Rt*-* Q 91t* for 
fc = l,2, ...,p—l, with equality holding either for all values of k or for 
none. 

6. Let /(n) denote the number of distinct n X n nilpotent matrices in 
canonical superdiagonal form. 

(i) Show that f(i) = i for i = 1, 2, 3;/(4) = 5;/(5) = 7. 

(ii) Try to develop a general formula for /(ft). 

§ 7 . 7 . Jordan Canonical Form 

We have referred at various times to the difference between the properties 
of field elements and the properties of matrices of field elements. One more 
contrast is worth examining, namely, the existence of inverses. The inverse 
of a field element b exists if and only if b 5* 0, so “inverse” and “nonzero” 
are closely related notions. For a matrix £, the existence of Br l certainly 
implies B 5* Z, but the reverse implication is not valid since nonzero matrices 
may be singular. Again, the concept of nilpotency with positive index never 
arises in a field, but it has come into our study of matrices in an essential way. 
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A nonzero nilpotent matrix, we feel, is an example of something which barely 
escapes being, zero, while a nonsingular matrix is quite the opposite. It is of 
interest therefore to show that every matrix is a combination of a nonsingular 
matrix and a nilpotent matrix. 

Theorem 7.13. Let T be any linear transformation on V, There exist 
subspaces (R and S such that 

(a) (R and S are T-invariant, 

(b) V = (R © S, 

(c) T, restricted to <R, is nonsingular, and T, restricted to S, is nilpotent. 

proof : We consider successive powers of T. From Theorem 3.5 we 
recall that the range and null spaces of powers of T form chains such that 

•0 3 (Rt => (Rt* =>•••=> Jt T * = (Rt* + i =■•• 

[0] <= 3l T c OIt* c: * • • cr 0l T * = * * • • 

for some integer k > 0. Let (R = (Rt* and S = 01 t*, and suppose 
£ G (R n S. Then { = rfY k for some ri e V, and £T* = 6. Hence 
6 = £T* = (i{T k )T k . Hence r; C = JIt*, so 0 = vfY k = {. Now (R 
and S have only 0 in common, and the sum of their dimensions is n. 
Hence 

*U = (R © S. 

Also, (Rt*T = (Rt* + j = (Rt*, and similarly for S, so (R and S are T-invariant. 
Now T maps (R onto (R, so T & is nonsingular. The vectors of S are mapped 
into 6 by T*, so T§ is nilpotent. 

The next theorem, which is stated in terms of linear transformations, gives 
us the Jordan canonical form for a matrix, as stated at the beginning of § 7.6. 

Theorem 7.14. Let T be a linear transformation with the distinct 
characteristic values Xi, . . . , X r , and let s t be the multiplicity of X, for 
i = 1, . . . , r. Then V is the direct sum of r subspaces, 

T) = Si © & © 1 * * © Sr, 

such that for i = 1, . . . , r 

(a) S» is T-invariant, 

(b) S t is of dimension 

(c) when restricted to the space T has the form T$< — A, I + N„ 
where N, is nilpotent. 

proof: Given T, we consider the linear transformation Ti defined by 

T : = T - Xil 
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and apply Theorem 7.13 to obtain Trin variant subspaces Si and (fti such 
that 

*0 = Si © (Ri, 

where Ti is nilpotent on Si and nonsingular on tfh. Clearly, Si and rth 
are T-invariant, since T = Ti + XJ; also T Sl = (Ti + Xil).s t is of the 
form (c), since T\ is nilpotent on Si. We next prove s\ = dim Si. Since 
S t and (Ri form a direct sum, we may choose any basis for Si and any 
basis for Oh, combining them to give a basis for *0. With respect to any 
such basis, T is represented by a matrix of the block form 



where and A <k, represent T restricted to the T-invariant spaces S! 
and Oh, respectively. Hence A^ — Xi/cr, represents Ti on Oh. Now for 
any X, 

det(i4 - XI) = detC^s, - X/, Sl ) ■ det(.4 (Hl - X/cr,). 

Since Ti is nonsingular on Oh, det — XjAh,) ^ 0, and the dimension 
of Si is at least as great as the multiplicity of the characteristic value Xi ; 

dim Si > .Sj. 

On the other hand, Ti is nilpotent on Si and we apply Theorem 7.10 to 
' choose a basis for Si such that Ti restricted to Si is represented by a 
matrix with zeros and ones on the superdiagonal and zeros elsewhere. 
Since T tSl = (Ti + Xil)s,, the matrix A s, has Xi in every diagonal position, 
zeros and ones on the superdiagonal, and zeros elsewhere. Thus As, has 
Xi as its only characteristic value, so 

dim Si < Si. 

Part (b) of the theorem follows from the two inequalities which we have 
obtained. 

Now all parts of the theorem are established for the case 2 = 1. We 
next consider the transformation T<h, and repeat the argument, using 

T‘2 = Tat, ~ XoI(R,. 

By finite dimensionality the theorem follows after r steps. 

The matrix interpretation of this theorem gives the Jordan canonical form 
for a matrix. Let .4 be a matrix with distinct characteristic values X„ each 
of multiplicity s, for i = 1, . . . , r. Then A is similar to a matrix in the 
block form 
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where is s, X s, with X, in every diagonal position, zeros and ones on the 
superdiagonal in a certain arrangement, and zeros elsewhere. 

Theorem 7.15. A matrix with characteristic values X ( , i = 1, . . . , n is 
similar to a matrix with these characteristic values in the diagonal posi- 
tions, zeros and ones along the superdiagonal, and zeros elsewhere. 


The following observations add real information to the rather loose state- 
ment of Theorem 7.15. Any Jordan matrix that is similar to A has r major 
blocks A x on the diagonal whenever A has r distinct characteristic values. 
A , is a square block of dimension the multiplicity of X, as a characteristic 
value. Each diagonal element of A t is X„ and any other nonzero entry of *1, 
must be 1 and must occur on the supcrdiagonal of *1 Furthermore, X, can 
appear as a diagonal element of no block other than A,. Hence if A and B 
are similar, and therefore have the same characteristic polynomial, the Jordan 
form of each must contain the same number of blocks, each of the same 
dimension. Except for possible permutations of the blocks along the diagonal, 
the two Jordan matrices can possibly differ only on the superdiagonal. 

Now consider the distribution of 0’s and l’s on the superdiagonal of the 
block Ail 



Since (A t — \J) §i = Z; we can apply Theorem 7.10 to decompose A t — X*/ 
into sub-blocks B tJ along the diagonal, j = 1, 2, ... , k(i): 



The dimensions of the sub-blocks are given by the uniquely determined 
numbers of Theorem 7.10, 


Pil > Pi2 > • • • > P**(0- 

This implies that the distribution of 0’s and l’s on the superdiagonal of each 
A t is uniquely determined; therefore, except for permutations of the major 
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blocks A, of A, the distribution of 0’s and Vs on the superdiagonal of A is 
uniquely determined. Hence for a given matrix A and a given ordering of its 
distinct characteristic values, A is similar to one and only one matrix in 
Jordan form. This is the assertion that for the n X n matrices the Jordan 
form is canonical with respect to similarity. 

This means that a linear transformation T determines r numbers, $ 1 , . . . , s r , 
where each s, is the multiplicity of X, as a root of the characteristic equation. 
Each s t determines other numbers p l} , in decreasing order, 

i 

V il ^ Pi2 ^ ‘ ^ P*Jfc( i)» 

8 t = p«i + Pt 2 + • • ■ + Pik(i), i = 1, . . . , r. 

The number pa is the dimension of the block B xj within the block A t . The 
superdiagonal elements of A are a string of p n — 1 ones followed by a zero, 
then pn — 1 ones, and another zero, and so on. Finally, the Jordan form 
is completely determined by the numbers, 

Xi, . . . , X r 

Si, . . . , S r 

Pn, . . • , Pi*(i) 


Prl, • • • , Pri(r)* 

The numbers p xi are often written in the form 

{(pil, Pl2, • . . , pli(l))(p21, P 22 , . . . , P2*(2)) • ■ * (pn, Pr2, • • • , Pr*( r))} , 


and this form is called the Segre characteristic of A. 

As an example, suppose that T has three distinct characteristic values, 
Xi = 2, X 2 = —2, X 3 = 0 with Segre characteristic {(3, 2, 1)(3, 1)(1, 1)}. Then 
the Jordan matrix that represents T is 
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Given an n X n matrix A t we frequently wish to find a Jordan matrix J 
which is similar to A. It is possible to give a systematic procedure for deter- 
mining P such that PA P~ l is in Jordan form. (See Exercise 9). But in many 
examples we can find J without computing P. First we determine the distinct 
characteristic values Xi, . . . , X r and their multiplicities si, . . . , s r . Only for 
those i , for which s t > 1, is there any ambiguity about the size of the sub- 
blocks within the major block A ,. Then we calculate the number c. of linearly 
independent characteristic vectors that correspond to X,. If e, = s % then A t is 
diagonal: A t = X,/. Even if c t < s 1} if s, is small, there are a limited number 
of possible forms for A „ and by studying the indices of nilpotcncy of A , — X.7, 
the exact form of A t can be determined. Indeed if s t = 3, then the form of A , 
is immediately determined by c iy since s t — c, is the number of l’s on the 
superdiagonal of A t . If s, = 4, no ambiguity exists for c t = 3 or c x = 1; for 
c t = 2, however, either of the supcrdiagonal patterns 1,0, 1 or 1, 1,0 is 
possible, depending on whether A { — X,/ is nilpotent of index 2 or of index 3. 

We state a few more facts, which arc of importance but are not pursued 
in this book. Let the characteristic polynomial of /I be 

P(X) = (X - Xi)-(X - X,)«. • -(X - X r )*\ 

The polynomials D t .{\) = (X — are called elementary divisors of A, since 
each is a divisor of P(X). It can be shown that the minimal polynomial of A is 

M(\) = (X - X0^(X - X*)*»- • -(X - \ r ) p * 9 
the product of the distinct elementary divisors of highest power. From this 
it is clear that P(X) is a multiple of M (X) and that P{A) = Z since M (A ) = Z. 
We shall give an independent proof of this, the Hamilton-Cayley theorem, 
in the next section. 

Finally, we remind ourselves that the theory of the Jordan form depended 
upon a factorization of P(X) into linear factors. This is always possible if 
the base field is the field of complex numbers, but other canonical forms 
might be needed if we are restricted to work, for example, entirely within 
the field of real numbers. 

Exercises 

1. Use the Jordan canonical form to solve the following problems. 

(i) Exercise 8, § 7.1. 

(ii) Exercise 3, § 7.2. 

(iii) Exercise 4, § 7.2. 

(iv) Prove Theorem 7.8. 

2. Determine whether or not the following matrices are similar: 
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A = 





3. Determine necessary and sufficient conditions on a, b , c, d so that the 
matrix 

(: i) 


will not be similar to a diagonal matrix. 

4. Determine the Jordan form of the following matrices. 

(i) Exercise 1 (iii), § 7.2. 

(ii) Exercise 1 (ii), §7.5. 



5. State and prove a theorem which gives a matrix interpretation of 
Theorem 7.13. 

C. Prove that the inverse of a nonsingular Jordan matrix has nonzero 
entries only on and above the diagonal but in general is not in Jordan form. 

7. Given the 12 X 12 Jordan matrix J whose diagonal elements in order 
are 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 1 and whose superdiagonal elements in order are 
1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0. In the terminology of the discussion following 
Theorem 7.15, supply the following information. 

(i) Write J in block form, showing the sub-blocks of each major 

block. 


(ii) Write the characteristic values and the multiplicity of each. 

(iii) Write the numbers p u . 

(iv) Write the Scgre characteristic of J. 

(v) Write the characteristic polynomial of J. 

(vi) Write the minimal polynomial of J. 

(vii) Write the elementary divisors of ./. 

(viii) By examining the block form of J , show that M(J) = Z for the 
polynomial M of (vi). 


8. Write a matrix whose characteristic values are 2, 3, 4, 5 and whose 
Segre characteristic is {(2, 1, 1) (1, 1, 1)(3, 2)(1)}. 

9. Head “The companion matrix and its properties” by Louis Brand, 
American Mathematical Monthly , vol. 71 (1904), pp. 029-634. This article 
describes a method of finding P such that P~ l AP is in Jordan form. 

10. Given the matrix 

( 0 1 0 0 \ 

0 0 1 o\ 

0 0 0 1 I' 

4 -4 -3 4/ 
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determine a Jordan matrix which is similar to A by finding the characteristic 
values of A and, if necessary, some of the characteristic vectors. Then check 
your conclusion by showing that P~ l AP is a Jordan matrix, where 

/111 0 > 

p-r 1 1 2 1 

V 1 1 4 4 

\-l 1 8 12/ 

§ 7 . 8 . An Application of the Hamilton-Cayley 

Theorem 

We are now able to give an easy proof of the Hamilton-Cayley theorem. 
Let A be a matrix with characteristic polynomial 

P(\) = (-l)'‘(x~X 1 )(X-X 2 )...(\-X n ) l 

and let J be a Jordan matrix which is similar to A. 



where * = 0 or 1. Let J x = J - X.7, i = 1, 2, . . . , n, so that 
P(J) = MNi./ 2 •••./«, 

since the factoring for P(\) holds for the matrix polynomial P(X). Now the 
2 th row of J , has * in the i + 1 column and zeros elsewhere. Therefore the 
nth row of J n is a zero row; the last two rows of ./ n -i«/ n are zero; the last three 
rows of J n—iJ n— n are zero rows, and so on. Therefore successive multiplica- 
tion of the J t gives 

TV) - £• 

By Exercises 4 and 5, § 7.3, P{A) is similar to P(J) = Z, and hence P{A) = Z % 
which is the desired conclusion. 

We now apply the Hamilton-Cayley theorem to obtain another method of 
calculating the inverse of a nonsingular matrix A. Let the characteristic 
polynomial of A be 

(— l) n (X n + CiX " -1 + •** +C n ). 

Since A is nonsingular, X = 0 is not a characteristic value, so c* 0. We have 
A n + ciA n ~ l + +cJ-Z, 

(A n + CiA"” 1 + • * • + c n -\A) = — Cn7, 
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and therefore 

A" 1 = -Cn'fA^- 1 + cxA^* + • • • + <•*.,/). 

Thus A 1 may be calculated as a combination of powers of A and the coeffi- 
cients of the characteristic polynomial of A. For large values of n the c, may 
be very hard to compute directly, but an alternate method is described below. 

Definition 7.7. The trace of A , denoted tr A, is the sum of the n charac- 
teristic values of A ; 

tr A = £ 

7 -r 1 

In addition to the present application, the concept of trace is quite useful. 
We recall that similar matrices have equal characteristic values and therefore 
have equal traces. Thus the trace of a linear transformation T can be defined 
to be the trace of any matrix that represents T. We now prove that tr ^4 is 
the sum of the diagonal elements of A. 

Theorem 7.16. tr A = X 

i- 1 

and hence if A and B are similar, 

ti n 

£ «„ = £ i>,, 

i-l t — 1 

pro of : Consider the characteristic polynomial of A , 

dctul - X/) = (-l) H (\ n + CiX r,_1 + • • • + c n ) 

= (-i) w (x — xo • - - (x — x„). 

From the determinant form, the coefficient of X n ~ l is seen to be 

(-1)"-’ £ a„ = (-l)-ci, 

1=1 

while from the factored form we get 

(“l) n+1 = (— l)”r!. 

Hence 

tr.4 = Z X. = E «u. 

i-i i = i 

As a corollary we have the equation 

Ci = — tr A. 

The other c. } can be determined similarly to give the following set of equations: 
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Ci = — tr A 

c 2 = — 2 _1 [citr.4 + tr A 2 ] 

c 3 = — 3“ I [r 2 tr A + ci tr A 2 + tr A 3 ] 


c n = —nr l [c n - 1 tr A + c n _ 2 tr A 2 + • * * + C\ tr A" -1 + tr A n ]. 

These equations permit us to calculate A 1 by calculating *4 , A 2 , . . , , A n 1 
and the diagonal elements of A n . From these matrices we can calculate the 
traces as sums of the diagonal elements, determine the c n and finally use the 
equation 

A-' = c n l (A n ~' + r,A- s + • • ■ + C H - X I). 

This method for finding A~ l requires fewer than n 4 mull iplieat ions and is 
easily described in the language of high speed computers. Also we obtain 
the characteristic polynomial of A as a by-product of the computation of A " l . 

Exercises 

1. Use the preceding method to calculate the inverse and the character- 
istic polynomial of the matrix of Kxercise 1 (ii), § 0.3. 

2. Prove that the trace of A k is the sum of the kih powers ol the character- 
istic values of A . 

3. Verify that 

r 2 = — j(fi tr A + tr A 2 ). 

4. Prove the following properties of the trace. 

(i) Tf A is nilpotent, tr A = 0. 

(ii) If A is idempotent, tr A = p(A). 

(iii) tr(A + B) = tr A + tr B. 

(iv) tr(kA) = A:*tr A. 

(v) tr A ' = tr A . 

(vi) tr (AB) = ir (BA). 

5. Let A and B be 2 X 2 matrices for which det A = det B and 
tr A = tr B. 

(i) Do A and B have the same characteristic values? Prove your 

answer. 

(ii) Are A and B similar? Prove your answer. 

(iii) Would your answers to (i) or (ii) be different if A and B were 
3X3 matrices? 

6. (i) Prove that if T is a projection, then for any matrix which repre- 
sents T, the sum of the diagonal elements is a nonnegative integer. 
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(ii) Prove that any projection can be represented by a diagonal matrix 
whose nonzero entries must be 1. 

7. Show that the number of 12 X 12 matrices, all of which have 
X 5 (X — 1) 4 (X + l) 3 as characteristic polynomial but no two of which are 
similar, exceeds 100. 

8. Let //„ be the n X n matrix obtained by reflecting I n across a hori- 
zontal line through its center. 

(i) Show that II n A is the matrix obtained by reflecting A across a 
horizontal line through its center, while AII n is the matrix obtained by 
reflecting .1 across a vertical line through its center. 

(ii) Describe II n AH n l . 

9. Prove that any matrix and its transpose are similar. 

10. Let A be an n X n matrix such that a, k a jk = a kk a XJ for all i y j, and A;. 
Determine the characteristic values of .4. 
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§ 8 . 1 . Conjugate Bilinear Functions 

Up to this point our study of vector spaces has been accomplished without 
any reference to the customary concepts of geometric measurement such as 
length, distance, angle, and perpendicularity. This is in sharp contrast to 
the usual development of the geometry of the plane or three-dimensional 
space, which from the start assumes a rectangular coordinate system and 
defines distance in terms of that system; for an abstract n-dimensioiial space 
in which we wish to consider a variety of coordinate systems, such an approach 
would be too confining. We shall now see how metric notions can be derived 
generally. In so doing we shall discover an interpretation of matrices which 
is distinct from the matrix representations of linear transformations and linear 
equations studied previously. 

A moment’s reflection makes it clear that length, distance, and angle are 
scalar quantities which are attached to each vector or each pair of vectors of 
a space. Thus, it is natural that we consider various functions from vector 
spaces to the scalar field. 

We have already seen that linear functions from *13 to 7 form a vector 
space *13', the dual space of *13. We now select two vector spaces *13,,, and 
over the same field 7 and consider functions which map the cartesian product 
*U m X V? n into 7. Since the properties of 7 come into greater prominence 
here than in our earlier work, we shall develop two parallel theories according 
to whether 7 is the real or the complex field. (Readers who are not fully 
familiar with the arithmetic of complex numbers should study Kxercise 1 
before proceeding.) 

The conjugate A of a complex matrix A = (a t; ) is defined by 

A = (a,.); 
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then a real matrix is characterized by J = A. Other conjugate properties 
of complex matrices, which may be proved as an exercise, are 

A = A, 

AH = ZE, 

I' = I 7 , 

A + B = 1 + 5. 

Definition 8.1. Let V m and \V n be vector spaces over ;T (real or complex). 
A conjugate bilinear function f is a function that assigns to each pair of 
vectors (£, rj) e V m X V? n a scalar value /(£, t?) in such a way that the 
following properties hold: 

(a) /(«£ i + 2, v) = fl/Ui, t?) + *?), 

(b) /(€, ot?i + fafe) = a/(£, t?i) + 5/($, t/ 2 ). 

Condition (a) states that / is a linear function of its first variable (which 
is a vector) ; condition (b) is a modified form of linearity in the second variable: 

/(£, Vi + m ) = /(£, 771) + /(£, 172), 

/(«, l>v) = c/(g, ,). 

If the scalar field is real, then 5 = b, in which case / is called simply a bilinear 
function. 

First we see how conjugate bilinear functions may be represented by 
matrices; the work we do here will bear a striking similarity to the corre- 
sponding matrix representation of linear transformations. Let [ a 1 a„, J 

be a basis for V m and {ft, . . . , ft,} a basis for e W„. Let £ = x t a t e V m . 
and r? = LA Vfij e e W n . Then 

/(£, >?) = / ( Z *.<*., >?) = Z xj(a„ v) 

m / n \ m F n “j 

= Z *./(“■> E ?/ A) = £j, I &/(<*„ ft) 

i-l \ / i-l U«1 J 

to n 

= Z Z x,yj{a„ p,). 

»-] j- l 

The mn scalars /(<*„ ft) therefore completely determine the value of the 
function /. Now consider the m X w matrix 

A = (a„), where a i; = /(«„ ft), 

which is uniquely determined by / relative to the a- and ftbases. Then an 
easy calculation shows that 


m, v ) = XAV. 
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Definition 8.2. An expression of the type 

m f n 

1 - 1 Y /-1 

is called a conjugate bilinear form in the m + n variables j lt . . . , x m , 
Vh • • • , V n * 

Thus each conjugate bilinear function / on V m X V? n determines, relative 
to chosen bases, a conjugate bilinear form whose coefficient matrix repre- 
sents / relative to those bases. For a different choice of bases we can expect 
to obtain a form with different coefficients and therefore a different matrix 
representation of /. As in §().“>, suppose that |yi, . . . , 7 m J is a basis for T) m 
and that |5i, . . . , 5„J is a basis for Let H, S be the linear transforma- 
tions 7 ,R = a t and 5,S = fij\ suppose that K is represented relative to the 
7 -basis by P and that S is represented relative to the 5-basis by Q . The 
matrix C which represents / relative to the y- and 5-bases is defined by 
Crs = /( 7 r, $,). Then 

( m « \ 

L VirJry E <h& J 

= V.r (yr, t '/>$«)] = L V.r j^E </,./( 7r, «,) J 



A direct computation shows that the last expression is the (z, j) element of 
the matrix PCQ'. Thus A = PCQ'. 

Theorem 8.1. A conjugate bilinear function / from c U m X V? n to is 
represented uniquely relative to a pair a, P of bases by the m X n matrix 

A = («„), where a l} = /(«„&); 
if £, t] are represented by A r , Y, then 

fa, v) = xaV. 

Furthermore, m X n matrices A and C represent the same conjugate 
bilinear function relative to different pairs of bases if and only if A and C 
are equivalent. 

proof : The representation of / by A was established by the previous 
discussion. Uniqueness follows easily since two functions are different 
if their functional values differ at any point. The discussion also shows 
that the same / is represented relative to two pairs of bases by equivalent 
matrices. Conversely, if A and C are equivalent matrices, let A — PCIi 
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for nonsingular P, R. Let Q = R\ and retrace the calculations already 
made to show that if C represents / relative to the y-, 5-bases, then A 
represents / relative to the bases 7R, <5S obtained from the nonsingular 
linear transformations represented by P and Q. 

Since a conjugate bilinear function is represented in different coordinate 
systems by equivalent matrices, and since equivalent matrices have the same 
rank, wc define the rank of a conjugate bilinear function (or a conjugate 
bilinear form) to be the rank of any representative matrix. 

Theorem 8.2. Let / be a conjugate bilinear function of rank r from 
V m X V!' n into ;T, where r < m, n. There exist bases for e 0 m and 
such that if £ e V m and 77 g W n , then 

/(£, rj) = XxT/i + r $ 2 + ■ • ■ + x r T/ r . 

p 11 o o f : Let / be represented by an m X n matrix A of rank r relative 
to some pair of bases for V m and c W n . Any matrix equivalent, to A has 
the same rank and represents the same conjugate bilinear function /. 
lly Theorem (i.14, A is C(jui valent, to the canonical matrix C which has 
the block I r in the upper left corner and zeros elsewhere. Then from the 
proof of Theorem 8.1, 

/(£, v) = XCT ' = X\Tj\ + x 2 Tjv + • • * + X r ljr. 

In many important applications we are interested in conjugate bilinear 
functions from T> X to iT. In this case a sharpened form of Theorem 8.1 
can be obtained by using two bases rather than two pairs of bases. Thus in 
the proof of Theorem 8.1 if we specify that 7 and a be the same basis, and 
that 5 and (3 be the same basis, then T = S and P = Q. We obtain the 
following result. 

Theorem 8.3. A conjugate bilinear function / from Dn X V n to $ is 
represented relative to two bases for by two n X n matrices A, C if 
and only if there exists a nonsingular matrix P such that 

A = PCP f . 

The relation A = PCV f obtained in Theorem 8.3 defines two special types 
of matrix equivalence, according to whether ;? is complex or not. 

Definition 8.3. Two complex n X n matrices A, C are said to be con- 
junctive over C if and only if 

A = PCF’ 

for some nonsingular complex matrix P. For any field 3 two n X n 
matrices A y C with elements in 5 are said to be congruent over 5 if and 
only if 
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A = PCP f 

for some nonsingular matrix P with elements in fF. 

It is easy to verify that conjunct ivily and congruence are further examples 
of equivalence relations on matrices, distinct from the equivalence relations 
we have considered previously: row equivalence, equivalence, and similarity. 

In terms of row and column operations, the description of congruence of 
matrices is easily stated. Since P describes a sequence of elementary row 
operations, P' describes the same sequence of elementary column operations. 
Hence A and B are congruent if and only if .1 can be obtained by transform- 
ing B by a sequence of changes, each change being an element aiy operation 
on rows followed by the same operation on the corresponding columns. 

For conjunctivity a slight modification of this result is needed. P still 
represents a sequence of elementary row operations. For the three type> of 
row operations, specified in § 0.2, we have PI, = P' u = P n and Afj = A = A Jn 
but JT [(c) = M't(c) = Mt(c). Hence /F and B are conjunctive' if and only 
if A can be obtained by transforming B by a sequence of changes, each change 
being an elementary operation on nms lollowed by the corresponding con- 
jugatc ojreratiori on columns 

It is convenient to use the symbol A* to denote the conjugate transpose TT' 
of A. In § 8.0 we shall see that if .1 represents a linear transformation T 
on V, then A * also represents a transformation T* on V. The transforma- 
tion T* is called the adjoint of T, but since the term "adjoint” has already 
been introduced for matrices in Definition 5.5 to mean a matrix of cofactors, 
we avoid calling the matrix A * the adjoint of A. 

Definition 8.4. An n X n complex matrix is said to be Ilermitian if 
and only if A* — A, where A* = ~A f . A is said to be skcw-llunmimn 
if and only if A* = —A. 

In the literature the term Ilermitian congruence is sometimes used in place 
of conjunctivity, and a Ilermitian matrix is sometimes called self-adjoint. 
Just as an arbitrary square matrix was shown in §4.2 to have a unique 
representation as the sum of a symmetric matrix and a skew-symmetric 
matrix, any complex matrix has a unique representation as the sum of a 
Hermit ian matrix and a skew-IIermit ian matrix: 

A = + A*) + (A - A*)J. 

Separate canonical forms relative to conjunctivity can be obtained for 
Hermitian and for skew-Hermit ian matrices, and for reasons which will be- 
come apparent in § 8.5, we are more interested in the Ilermitian case. (See 
also Exercises 8 and 9.) 
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Theorem 8.4. Every complex Hermitian matrix of rank r is conjunc- 
tive over 0 to a diagonal matrix with nonzero real numbers in the 
first r diagonal position and zero elsewhere. 

proof: Let A be Hermitian and P nonsingular. Then (PAP*)* = 
P**A*P* = PAP*y so that a Hermitian matrix of the same rank is 
obtained from A by applying an elementary row operation followed by 
the corresponding conjugate column operation. Every diagonal element 
of a Hermitian matrix is real. We show that if A ^ Z , A is conjunctive to 
a matrix in which some diagonal element is nonzero. Suppose that all 
diagonal elements of A are zero, and let a tJ ^ 0. We may assume that the 
real part of a i} is nonzero, for otherwise A is conjunctive to M\(i )A M ,( — ? ), 
in which the (i,j) entry is real and nonzero. Then B — A tJ AA* = A tJ A A * tJ 
is Hermitian, obtained from A by adding row i to row j and then 
adding column i to column j. Hence />,, — + a }l = a l} + a tJ ^ 0. Ey 

permuting row j and row 1, then column j and column 1, we obtain a 
matrix C conjunctive to A and having r a = bjj. If c,i ^ 0, the sequence 
A U M A — C\\Ci \ l ) of row operations and the corresponding conjugate col- 
umn operations will then produce zeros in the (i, 1) and (1, i) positions 
for i = 2, . . . , n. The resulting matrix is 



If A i = Z, we arc through; otherwise the process may be repeated on 
Ay. Since conjunctive matrices have the same rank, after r steps we are 
through. 

Theorem 8.5. Every symmetric matrix of rank r over $ is congruent 
over $ to a diagonal matrix with nonzero elements in the first r diagonal 
positions and zeros elsewhere, provided 1 + 1 0 in 5. 

proof : The proof of Theorem 8.4, modified by using transposes in- 
stead of conjugate transposes, is effective. 

Theorem 8.6. Every complex Hermitian matrix of rank r is conjunc- 
tive over C to a diagonal matrix with 1 in the first p diagonal positions, 
— 1 in the next r — p diagonal positions, and zeros elsewhere. 

proof : Let a Hermitian matrix A be conjunctive to a matrix D in 
the diagonal form of Theorem 8.4: 
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Let p denote the number of positive elements of D. Then P tJ DP * coin- 
cides with D except that </, and dj have been interchanged. Hence A is 
conjunctive to a diagonal matrix K in which the positive diagonal 
elements come first, say ci, . . . , the negative diagonal elements can 
be expressed as ~c p+lj . . . , —c T} and the remaining elements are zero. 
Thus i is a real number for i = 1 , . . . , r, and 

PEP* 

is in the form stated in the theorem, where 

P = M r (e, , ' 2 )--*M l (cf 1/2 ). 

Theorem 8.7. Every real symmetric matrix of rank r is congruent 
over (ft to a diagonal matrix with 1 in the first p diagonal positions, 
— 1 in the next r — p diagonal positions, and zeros elsewhere. 

proof : The proof of Theorem 8.0, modified for congruence instead of 
conjunct ivity, is effective. Notice that the argument will not be valid 
in any field that fails to contain a positive square root of each of its 
positive elements. 

It is an important fact that the form stated in Theorem 8.6 is canonical 
for Ilermitian matrices under conjunctivity. Similarly, the form stated in 
Theorem 8.7 is canonical for real symmetric matrices under congruence. This 
means, of course, that each Hermitian matrix determines a unique value 
of r and of p, and similarly for each real symmetric matrix. 

To prove uniqueness, let .4 be an n X n Hermitian matrix. Relative to 
an arbitrary basis {a h . . . , a„} for V n , A determines a conjugate bilinear 
function / from V n X to 6: /(a„ aj) ~ a tJ . Since A is Hermitian, The- 

orem 8.0 guarantees that there exists a basis (fr, . . . , /?„}, relative to which 
/ is represented by the matrix B = PAP* of the block form 

B = ~I T - P 

for some r and p. Suppose there exists another basis { 71 , . . . * 7 n), relative 
to which / is represented by C = (JAQ*, where 
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Since conjunctive matrices have the same rank, 

r = p{B) = p(A) = p(C) = s. 

Let £ = £7 - 1 = Ei-i Wi7. and rj = ?/A = £7-1 v«7.- Then 

/(£, t/) = XBY* = Jitf, + • • • + x p Jj v - - • • • - r r 7/ f , 

= (T T F* = ii\V\ + • • • + - %i^i - ... - Ur v T ' 

l'urt hermore let i)Tl = . . . , 0 r ] and 31 = [ 71 , - . • , 7*, 7r+i, • • • , 7nJ. 


Then dim 3ft + dim 31 = (r — p) + (ai — r + 7 ) = n + (7 — p). Suppose 
that 7 > p; then 3ft f) 31 ^ [0]. lor any nonzero £ e 3R fl JR we have 

/(£, £) = -Xp+iXpii - • • • - x T x r < 0 , 

= */i77i + • • • + v,,Ti t] > 0, 

a contradiction. Hence 7 < p. By reversing the roles of p and 7 , the reverse 
inequality follows, so p = 7 . 

Hence any n X n Hermit ian matrix A uniquely determines two non- 
negative numbers p and r such that A is conjunctive to one and only one 
matrix B in the form given above. Instead of specifying r and p to identify 
the conjunctive y class to which A belongs, it is customary to specify r 
and s, where s = 2p — r; clearly s is uniquely determined by A. 

Definition 8.5. The signature s of a Hermit ian matrix A is defined by 
s = p - (r - p) = 2p - r , 

which is the number of diagonal l’s diminished by the number of diagonal 
— l\s in the canonical form of A relative to conjunctivity over e. Simi- 
larly, the signature of a real symmetric matrix A is 2 p — r, where p 
and r arc determined from the canonical form of A relative to congruence 
over (H. 

The second part of this definition anticipates that p and r are uniquely 
determined for each real symmetric matrix, which may be proved by a 
straightforward modification of the argument given for the Hermitian case, 
preceding Definition 8.5. 

Theorem 8 . 8 . Two n X n Hermitian matrices are conjunctive over C 
if and only if they have the same rank and the same signature. 

proof: If .4 and B have the same rank and signature each is con- 
junctive over t J to the same matrix in canonical form, and hence are 
conjunctive to each other. Conversely, if A and B are conjunctive, so 
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are their respective canonical forms, which by the uniqueness argument 
must be equal. Hence they have the same rank and the same signature. 

Theorem 8.9. Two n X n real symmetric matrices are congruent over 
(R if and only if they have the same rank and the same signature. 

rnooF: Exercise. First you must establish the uniqueness of the 
canonical form for congruence over iH of a real symmetric matrix. 

Finally we observe that for complex symmetric matrices the diagonaliza- 
tion process described for the real case by Theorem S.7 can be extended to 
obtain a particularly simple standard form for congruence over C. 

Theorem 8.10. Every complex symmetric matrix of rank r is con- 
gruent overt’ to the matrix 



p n o o f : Exercise. 

Exercises 

1. Operations for complex numbers are defined as follows, where a, b , c, 
and d are real : 

Sum: (a + ib) + (c + id) = (a + c) + i(b + d). 

Product: (a + ib) (c -f- id) = (ac — bd) + i(ad + be). 

Conjugate: a + tb = a — ib. 

Magnitude: | a + ib | = Va- -f- 6-. 

Show that if x, y are complex numbers, then 

(i) x + y = X + y, 

(ii) xy = 7 y, 

(iii) x = j, 

(iv) x2 = |x|~, 

(v) x + x is real, 

(vi) |j| is real and nonnegative, 

(vii) L*//| = \x\‘\yl 
(viii) \x + y\ < jo*! + \y\. 

2. (i) Write a matrix representation of each of the bilinear forms on T> 2 

xoji - x- 2 yi + 2x 2 y 2 , 

and 

4tU\V\ + 4u\V2 + 2u2ih + -\iiiVi. 
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(ii) Are these matrices congruent? 

(iii) What does your answer to (ii) imply about the two forms? 

3. Prove that congruence over JF is an equivalence relation on n X n 
matrices. 

4. Prove that B is congruent to A if and only if B can be obtained from A 
by identical sequences of elementary row and column operations. 

5. Let A be a skew matrix. 

(i) Show that a tt = 0 if 1 + 1 ^ 0 in SL 

(ii) Prove that if B is congruent to A, then B is skew. 

0. Prove the following properties of the conjugate transpose operation on 
complex matrices. 

(i) (A*)* = A. 

(ii) p(A*) = p(A). 

(iii) (A + B)* = A* + B * 

(iv) (AB)* = B*A*. 

(v) If A is nonsingular, 04" 1 )* = (A*)' 1 . 

(vi) If A is complex, A A* and A* A are Hermitian. 

(vii) If A is Hermitian, then a„ is real for each i; if A is real and 
symmetric, A is Hermitian. 

(viii) If A is skew-Hermitian, then a tt is pure imaginary for each i ; if 
A is real and skew, then A is skew-Hermitian. 

(ix) Let B be conjunctive to A. If A is Hermitian, so is B ; if A is 
skew-Hermitian, so is B. 

7. Prove that any n X n complex matrix A has a unique representation 
A = H + K, where II is Hermitian and K is skew-Hermitian. 

8. (i) Prove that H is Hermitian if and only if HI is skew-Hermitian, 
where i 2 = —1. 

(ii) Combine (i) and Theorem 8.6 to deduce a canonical form for 
conjunctivity of skew-Hermitian matrices. 

9. Let A be an n X n skew-symmetric matrix over a field 5 in which 
1 + 1^0. Recalling the results of Exercise 5, use appropriate row and 
column operations to show that A is congruent to a matrix of the diagonal 
block form 
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for i = 1, . . . , f. Deduce that A has an even rank, and that this form is 
canonical relative to congruence over for skew-symmetric matrices; that is, 
two n X n skew-symmetric matrices are congruent if and only if they have 
the same rank. 

10. Perform row and column operations to reduce the following skew 
matrix to a congruent matrix in the canonical form stated in Exercise 9: 



/-10 fi 2 \ 

11. Given A = I 5 0 3 J 
\ 2 3 0/ 


(i) Find a matrix congruent to A over the rational field and in the 
form of Theorem 8.5. 

(ii) Find a matrix congruent to A over (it and in the form of Theorem 
8.7. Determine the rank and signature of A. 

(iii) Find a matrix congruent to A over Ci and in the form of The- 
orem 8.10. 

(iv) Illustrate that Theorem 8.5 does not describe a canonical form 
for congruence over JF. 

12. Determine the canonical form relative to conjunctivity of the following 
Hermitian matrix and determine its rank and signature: 


1 i 1 -f i 

-i 0 1 i a = — 1. 

1 - i 1 2 


13. Prove Theorem 8.10 and deduce that two symmetric complex matrices 
are congruent over (B if and only if they have the same rank. 


§8.2. Inner Product 

Before continuing with a general investigation of the background of metric 
notions, let us return to a familiar example to see where we have been and 
where we are going. In the real plane £ 2 , the dot product of vectors is a bi- 
linear function. If £ = (x h xi) and ?? = (z/i, 7/2), then 

i’V = /(£, v) = *ii/i + toy* 
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The length of the vector £ is 

Hfii = Vxi + 7i = [/ft, «)]»'*, 

which has one important property that length should possess; namely, that 

the length of a vector is a positive 
^ real number, except for the zero 

= x 2 ) vector which has zero length. 

I If we attempt to parallel this con- 

e 2 II £ || | struction for the two-dimensional 

| complex space, then the same dot 

| product (with x h x 2) //j, ?y 2 complex) is 

x | a bilinear form; since root extraction 

x is defined for all complex numbers, 
Figure 8 1 it would be possible to define II £ll as 

we have for the real case. But then 
llfll would be a complex number, hardly suitable for expressing length. It 
is for this reason that we use conjugates of complex numbers to define the 
conjugate bilinear function 

</(£, v) = *ilh + Vt, 

where £ = (a*i, x 2 ) and y = (?/ i, 2/2), each component complex. It is important 
to observe that if the components are real, g reduces to the bilinear form 
defined above. Now if length is defined by 

iif 11 = m, f)i i/j = ^n 7 + w 2 > 

the length of every nonzero vector is a positive real number, as desired. 

From this we see that not all bilinear functions lead to a reasonable concept 
of length. The important distinction is the type of symmetry in the two 
vector variables possessed by the functions, for we observe by direct calcula- 
tion that 

/(£, v) = f(y, 0, 

while 

g(i, v) = g(vy £)• 

Now we are ready to define the concept of an inner product , which is based 
upon these considerations. 


Definition 8.6. Let D bea vector space over (R (or e). A real (or com- 
plex) inner product is a function p with domain DXD and range <R 
(or e) which satisfies 

(a) p(ci £1 + Cjfc, 17) = C!p(£ 1; 77) + c 2 p(£ 2, 77), 

(b) p(f, 17) = P(v, £), 

(c) p(f, £) > 0 if £ 5* 0 } and p{0, 6) = 0. 
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This definition asserts that an inner product is a scalar-valued function 
which is conjugate bilinear , (a) and (h), and positive definite f (c). For a real 
inner product, (b) reduces to the assertion of symmetry in £ and rj. Thus 
a real inner product is a real valued function of two vector variables which 
is bilinear, symmetric, and positive definite. In the complex case property 

(b) is called conjugate symmetry or Hermitian symmetry. In either case, 
(b) implies that p(£, £) is real. 

Definition 8.7. 

(a) A real vector space V for which a real inner product is defined is 
called a Euclidean space. 

(b) A complex vector space TJ for which a complex inner product is 
defined is called a unitary space . 

(c) Fuel idcan spaces and unitary spaces collectively are called inner 
prod net spaces. 

Examples of Inner Product Spaces 

(a) Kuclidean n-space with the dot product 

pit, V) = + * 2//2 + • • • + JTnlJn. 

(b) The infinite-dimensional space of all real valued functions continuous 
on the interval 0 < t < 1 with the inner product 

v(f, y) = f f(!)v(t)dt. 

(c) Unitary //-space e n with the inner product 

pit, v) = -ri/7i + X2TJ2 + • • • + x„77„. 

(d) The infinite-dimensional space of all complex valued functions contin- 
uous on the real interval 0 < t < 1 with the inner product 

v(f, y) = f 

We conclude this section by deriving a general inequality which has many 
important applications in Euclidean and unitary spaces. 

Theorem 8.11. (The Schwarz inequality.) In any inner product space V 
I pit, v )\ 2 < pit, t)vin, v) 

for all £, 7} G U 

proof : I r or any a, p(a, a) is real and nonnegative. Let a = a£ + brj f 

where a = — piy y £) and b = pit, t) then L = b > 0, and a = — p(£, 17). 

Therefore, we have 

0 < p(a£ + brf f a£ + brj) = apf£, a£ + brj) + bp(rj t o£ + brj) 

= aapit, t) + rj) + bdp(v, t) + bbp( 17, rj) 
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= adb — abd — bda + bbp( 17, 17) = 6[— + 6 p(i?, 1?)] 

= &[p({, £)p(*?i *?) - p(£, i?)p(£, if)]. 

If b = 0, we have £ = 0, and the Schwarz inequality is trivially valid with 
both sides equal to zero. Otherwise, the last bracket above must be non- 
negative, which is the assertion of the Schwarz inequality. 


Exercises 

1. Verify that examples (b) and (c) actually satisfy the definition of an 
inner product. 

2. If p is any inner product verify that 

(i) p(c£, dy) = c 3 p(i, 77) , 

(ii) p( 6 , y) = p(£, 0) = 0. 

3. If p is any inner product and if rji and 772 are vectors such that 
P(£, i?0 = p(£, 172) for all £, then ^ = V2 . 

4. Carry out in detail the following outline of an alternative proof of the 
Schwarz inequality. 

(i) For all complex numbers x, y and all vectors £, 17 

0 < p(x£ + yy, x£ + yy) = xlp({, {) + 2 Re[x5p(£, y)] + yyp(y, y). 

(ii) Specify x to be real and choose y = p(£, rj) to obtain the real 
quadratic inequality, valid for all real x, 

o < p({, £)x 2 + 2|p(£, 7 j)| 2 x + |p({, y)\ 2 p(y, y)- 

(iii) Apply a criterion for a real quadratic function to be nonnegative, 
obtaining the Schwarz inequality. 

5. Prove that equality holds in the Schwarz inequality if and only if the 
set {£, rj} is linearly dependent. 

6. Show that the following general theorems are direct consequences of the 
Schwarz inequality: 

(i) If xi, . . . , x n and y h . . . , y n are any real numbers, then 

(ii) (Cauchy inequality.) If xi, . . . , x„ and y h . . . , y n are any com- 
plex numbers, then 

3?. ! < (£ W 2 ) (t I2/.I 2 )' 
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(iii) If/ and g are real functions continuous on the interval a < x < 6, 

then 

f -Mite J\-(r)djc. 


§ 8 . 3 . Length, Distance, and Orthogonality 

Let V be an inner product space that is either Euclidean or unitary. The 
inner product function p is used to define length, distance, and perpendicu- 
larity; we begin with length. 

Definition 8.8. The length Ilf II of a vector £ is defined by 

nf ii = r Ms, or-. 

Theorem 8.12. Length has the following properties: 

(a) llcfll = |c| • 11(11. 

(b) llfll > 0 if £ 5* 0, and 11011 = 0. 

(c) Ilf + i? II < 11(11 + Hr? II. 

proof : Property (a) follows from Exercise 2 of the preceding section, 
and property (b) from part (c) of Definition 8.6. To prove property (c) 
we first note that the Schwarz inequality can be written 

|p(fc*)l = IpOi,OI < IlflMI^II. 

Since p(£, r?) = p(v, £), p((, r?) + pfo, () ia real, and 

|p((, v) + pO?, 01 < |p((, v)\ + |p(iy, 01 < 211(11 • Hi? II. 

Thus 

Ilf + i?ll 2 = p(f + r?, £ + r?) 

= p((, 0 + p((, 0 + P(*7> 0 + P(r?, r?) 

< p((, 0 + |p((, 1?) + PO?, 01 + p(v, v) 

< llfll 2 + 211(11 • Hr? II + Hi? IP 

from which (c) follows immediately. 

For the case of Euclidean n-space where p is the dot product, the length of 
( = (xi, . . . , x n ) is simply the familiar form 

11(11 = Vx'i + xl + ■ ■ • + xl 

Property (c) is interpreted geometrically as the observation that the length 
of any side of a triangle does not exceed the sum of the lengths of the other 
two sides. Hence (c) is called the triangle inequality. 

Since the points of n-space may be interpreted as n-tuples, or vectors, the 
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distance between vectors can be regarded as the distance between those points, 
or, equivalently, the length of the arrow from one point to the other. 

Definition 8.9. The distance rf(£, rj) between two vectors £ and rj is 
defined by 

rf(£, v) = "£ “ i?H. 

Theorem 8.13. Distance has the following properties: 

(a) rf(£, t 7) = rf(i7, £), 

(b) 1?) > 0 if £ 5* 77, and rf(£, £) = 0, 

Cc) d(£ f 77) < rf(£, f) + </(£, 7;). 

pkoo f : Exercise. ' 


Thus the distance which results from any inner product has the familiar 
properties of distance as defined by coordinates in analytic geometry: it is 
symmetric, is positive for distinct points, and satisfies the triangle inequality. 
Any space, for which a distance function is defined, satisfying these three 
properties is called a metric space. 

When we come to angle, we must distinguish between Euclidean space and 
unitary space. The Schwarz inequality can be written 


llfll • llijll 


< 1 . 


If p(£, 77) is real, as in the Euclidean case, this means that 

P(fi v) 

ll£ll-M 

is a real number between —1 and +1. Hence it is the cosine of a uniquely 
determined angle ^ in the range 0 < ^ < 7 r. In the unitary case p(£, 77) is 
complex, and the corresponding interpretation is not valid. For our work, 
however, it is not important to have a measure of the angle between two 
vectors, but it is most convenient to have a definition of orthogonality (per- 
pendicularity). In the real case the necessary definition for orthogonality is 
clear because the cosine of the angle between perpendicular nonzero vectors 
must be zero, and hence p(£, 77) = 0. This is the definition we adopt for the 
complex case as well. 


Definition 8.10. Two vectors £ and 77 are orthogonal if and only if 
P($» v) = 0. 

Theorem 8.14. In any inner product space *0, 

(a) £ is orthogonal to every 77 e V if and only if £ = 0 , 
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(b) if £ is orthogonal to every vector of a set S, then £ is orthogonal to 
the subspace spanned by £, 

(c) any set of mutually orthogonal nonzero vectors is linearly independ- 
ent. 

proof : Exercise. 

This theorem hints that several geometric properties which are familiar in 
62 and 83 also hold in any inner 
product space. For example, (c) 
suggests that a basis of mutually 
orthogonal vectors (a rectangular 
coordinate system) always can be 
found for V ; in the next theorem \/e 
construct such a basis by use of 
projections to split a vector £ into 
orthogonal components, £ = a + 77. 

This construction in its general form 
is known as the Gram-Schmidt or - 
thogonalization process. 



Theorem 8.15. In any finite-dimensional inner product space V, there 
exists a basis consisting of mutually orthogonal vectors. 

proof: If V is one-dimensional, any nonzero vector forms such a 
basis. We proceed by induction, assuming the theorem for any space 
of dimension k. Let V be of dimension A: + 1 , and let t S be any subspace 
of dimension k. By the induction hypothesis there exists an orthogonal 
basis {<*1, . . . , a*} for S. For £ £ S, let 


= L 


p(?, «<) ^ . 

, t \ « i=2- C,a„ 

iP(a„a.) ,-i 


and let 

v = £ — *■ 

Clearly, a e S, and we now show that rj is orthogonal to each a,: 
p(v, «.) = ?(£ — a.) 


= P(£, “.) ~ V c,a„ a,) 


= ?(£, a.) ~ E C;P(“ji «,) 
1- 1 

= ?(£, «.) — C.P(«.. a.) 


P(£, «,) 


P(£» «.) . 

p(a„ a,) 


p(a„ a,) = 0. 
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Hence the vectors {ai, . . . , a kl rj} are mutually orthogonal and by The- 
orem 8.14 (c) form a basis for *U*+i. 

The vector o in the preceding proof is called the orthogonal projection of £ 
on S; the coefficients c % of o relative to the a-basis for S can be written 

= p(£> <*0 Ml! p(£» at) 

1 p(a t , a x ) llajl * llajl 11(11 

In the Euclidean case if the basis vectors are so chosen that lla»ll = 1, this 
reduces to 

c x = ll£ll cos 

where is the angle between £ and a t . The numbers c t are called direction 
numbers of £. 

Theorem 8.16. If S is any subspace of an inner product space !)„, there 
exists a unique subspace S 1 such that 

(a) *0, = S © S x , 

(b) p(< j, o') = 0 for every a eS and o' e S 1 . 

proof : Exercise. (S 1 is called the orthogonal complement of S.) 

Now we combine the notions of length and orthogonality to define a normal 
orthogonal basis, which is simply a basis of the form {ci, . . . , c„} for 8 n , and 
show that these metric concepts assume a very familiar form relative to such 
a basis. 

Definition 8.11. In any inner product space a vector of unit length is 
called normal. A set of mutually orthogonal vectors, each of which is 
normal, is called a normal orthogonal (or orthononnal) set. 

If a is any nonzero vector in an inner product space, then llalh 1 ** is normal. 
Hence a normal orthogonal basis is obtained by normalizing in this manner 
each vector of an orthogonal basis. The importance of a normal orthogonal 
basis is revealed by the next theorem which implies that in Euclidean space 
any inner product p assumes the form of the dot product relative to a normal 
orthogonal basis. 

Theorem 8.17. Let p be any inner product for the space D„, let 
{<*i, . . . , a,,} be a normal orthogonal basis for V n , let £ = Y.t-i 
and let rj = Ylt-i Then 


P(£, n) = -Tll/l + Z’ifo + * • • + .T„]7n. 
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proof: p(£, v) = P (jL.^ 

= X X,p ^a„ X ( »/;«;) 

= X j X, ^ «j)^ 

n 

= X x,J?,p(a„ a.) 

1-1 

71 

= Xillif 
1 = 1 

where the last equality holds since the a, are normal, and the preceding 
equality holds since the a, are orthogonal. 

From this it is clear that in any w-dimensional Euclidean space, with tho 
metric concepts defined by any inner product, if we choose a normal orthog- 
onal basis, the following familiar formulas for length, distance, and angle 
are valid: 

II £11 = vVf + xjj + ■ • • 4- xj !, 

||f - ,|| = V(x, — //,)-’ + • • • + (x, — z/„) ! , 

cos ,) = . 

V'arf + • • • + x*v^ +.••+»* 

This last formula is simply the law of cosines, 

Ilf - V P = 11*11* + \\ v \\> - 2llfll • IIt/II cos*. 



Figure 8.3 


It is also worth noticing the relationship between matrix multiplication and 
a real inner product. If A = ( a l} ) } B = (fr„), and A B — (c„), then 

n 

C 1 j = ^ (Likbkj 
k - 1 

= p(«., ft). 
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where p is the real dot product of the ith row vector a t of A with the jth 
column vector ft of B. This observation is valid for matrices over any field, 
but the dot product referred to has the form of the real dot product. 

Exercises 

1. Prove that the following theorems of geometry hold in any Euclidean 
space. Illustrate each in 8 2 . 

(i) The Pythagorean theorem and its converse: 

£ is orthogonal to tj if and only if 

Ilf II - + Ml* = II* + rj IK 

(ii) The law of cosines: 

Ilf - 77 II- = ll£ll 2 + Ml* - 211*11 -Ml -cos*. 

(iii) The diagonals of a rhombus are perpendicular: 

if ll£ll = Ml, then*p(£ + y, £ — y) = 0. 

2. (i) Prove that in any inner product space 

p(£ + V, t + V ) + pit - V, S — v) = -vtt, £) + 2p(»), »;). 

(ii) What familiar geometric theorem does (i) assert? 

3. State and prove in the language of arbitrary Euclidean space the 
theorem that the midpoints of the sides of any quadrilateral are the vertices 
of a parallelogram. 

4. Let V be an inner product space with {a h . . . , a n } as a normal orthog- 
onal basis. 

(i) Prove Bessel’s inequality : 

m 

if p(£, a t ) = c t for i = 1, 2, . . . , m < n, then £ k*| 2 < Ilf II 2 - 

t- 1 

(ii) Prove Parseval’s identity: 

n 

P(£> v) = L P(£, “.)?(<*.. v)- 

l - 1 

(iii) Interpret each in 83 by a diagram. 

5. Prove that in any Euclidean space p(a, ft — 0 if and only if 

lla + c(3\\ > Hall 

for every real number c. 

6. Prove Theorem 8.14. 

7. Prove that if S is any subspace of an inner product space U, then every 
f 61) has a unique decomposition 

£ = <7 + y , 
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where a e S and rj is orthogonal to S. 

8. Prove Theorem 8. 1 0. 

9. Let S and 3 be subspaces of an inner product space *0. Prove that 

(i) = s. 

(ii) (S + 3) x = S 1 fl 3 X . 

(iii) (S D 3) 1 = S 1 + 3 1 . 

10. Beginning with the orthogonal vectors 

«i = (2, 1, -f>, 0) 
a 2 = (3, ~1, 1,0) 

in use the Gram-Schmidt orthogonalization process on the normalized 
form of and a 2 to obtain a normal orthogonal basis for £4. 

1 1 . Prove that the mapping £ — ►- c as defined in the proof of Theorem 8.15 
by the Gram-Schmidt orthogonalization process is a projection, as defined 
in § 7.4. 


§8.4. Isometries 

In many important physical problems the transformations which arise are 
ones which leave all distances fixed. Such a transformation is called a rigid 
motion , and any motion of a rigid body can be described by a transformation 
of this type. It is intuitively clear that the product of rigid motions is a 
rigid motion, that the identity transformation is a rigid motion, and that each 
rigid motion has an inverse that is a rigid motion. In other words, the set 
of all rigid motions of a space forms a group; for Euclidean space the group 
is called the Euclidean group , and for unitary space the group is called the 
unitary group. 

A simple example of a rigid motion is a translation, wherein each point in 
space is moved the same fixed distance in the same direction: 

£T = £ + a for fixed a and all £. 

But for a 5 * 0, 0T = 0 + a = a , so a nonzero translation is not linear. How- 
ever, any rigid motion can be decomposed into the product of a linear trans- 
formation and a translation. It is the linear component of rigid motion which 
we now propose to study. 

Definition 8.12. A linear transformation T on an inner product space 
is said to be an isometry if and only if 

IlfTII = ll£ll for each £ e U. 
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An isometry on a Euclidean space is called an orthogonal transformation; 
an isometry on a unitary space is called a unitary transformation. 

Clearly , an isometry preserves distance; the following theorem shows that 
it also preserves the inner product and hence preserves orthogonality and all 
other metric notions that are defined in terms of the inner product. 

Theorem 8 . 18 . A linear transformation T on an inner product space 1) 
is an isometry if and only if p(f, rj) = p(f T, 77T) for all f, rj e 1). 

proof : We begin the proof in complex form: 

p(£ + v, £ + v) = 7>(£, £) + p(£, v) + p(v, £) + p(v, v)> 

Ilf + 77II 2 = llfll 2 + p(f, 77) + p(f, tj) + llr;ll 2 . 

But p(f, rj) + p(f, 77) is twice the real part of p(f, 77), and it is expressed 
as an algebraic sum of squares of lengths which are preserved by T. If 
p(f, v) is real, this means p(f, 77) is preserved by T. If p(f, 77) is complex, 
a similar calculation for p(f + irj, £ + iv) shows that the complex part 
of p(£, 77) is also preserved by T. Conversely, if p is preserved by T, 
length is also preserved, since Ilf II = p(f, f) 1/2 . 

Next we consider the effect of an isometry on a normal orthogonal basis 
{«i, . . . , a„}. Since T preserves inner product, 

Ha, Til = 1, i = 1, . . . , n 

and 

p(a,T, a/T) = 6 tJ , 

Thus an isometry maps a normal orthogonal basis into a normal orthogonal 
basis. The row vectors of the matrix A, which represents T relative to this 
basis, must therefore be of unit length and mutually orthogonal. 

Definition 8 . 13 . A real n X n matrix A = (a i; ) is said to be orthogonal 
if and only if 

n 

Z] Q'lkQ'jk ~ 5 ,j, i y j = 1, . . . , n. 

*-1 

A complex n X n matrix A = (a„) is said to be unitary if and only if 

71 

Zi = if j = lj • • • j W. 

*-1 

Theorem 8 . 19 . A real n X n matrix A represents an orthogonal trans- 
formation T relative to a normal orthogonal basis if and only if A is 
orthogonal. A complex n X n matrix A represents a unitary transforma- 
tion T relative to a normal orthogonal basis if and only if A is unitary. 
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proof: Our previous remarks show that an orthogonal transformation 
is represented by an orthogonal matrix relative to a normal orthogonal 
basis. Conversely, the rows of a matrix represent the image of the basis 
vectors under the corresponding transformation. The proof is the same 
for the complex case. 

Theorem 8.20. Any orthogonal or unitary matrix is nonsingular. 

proof : The row vectors are mutually orthogonal and hence linearly 
independent. 

Theorem 8.21. A is an orthogonal matrix if and only if A~ l = A'. A is 
a unitary matrix if and only if A' 1 = A*. 

proof : Let A be orthogonal, and let A A' = C = Then c,j = 
L2-i <*ika' kJ = YJc-idxkdjk = Hence C = /. Conversely, if A' = A~\ 
AA' = /, so 2 l*«i a lk a Jk = d l} . The unitary case may be proved as an 
exercise. 

Of all the methods described for calculating the inverse of a matrix, this 
is by far the simplest. Unfortunately, not every matrix is orthogonal or 
unitary. 

Theorem 8.22. If A is orthogonal or unitary, |det A| = 1. 
proof : Exercise. 

Recall that two n X n matrices A and B are similar if and only if they 
represent the same linear transformation relative to two bases; if one basis 
can be carried into the other by an isometry, then 

B = PAP-' 

= PAP' if P is orthogonal, 

= PAP * if P is unitary. 

In particular, if one works only with normal orthogonal bases, similarity 
coincides with congruence in Euclidean spaces and with conjunctivity in 
unitary spaces. In numerous problems having physical significance, isometries 
are sufficient to carry out the desired changes of coordinates, as we shall see 
in the next section. 

Exercises 

1. (i) Find a matrix representation for the linear transformation which 
rotates each vector of the real plane through a fixed angle >1'. 

(ii) Prove that this matrix A is orthogonal by verifying A A' = /. 
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2. Prove Theorem 8.22. 

3. (i) Find a matrix representation B for the rigid motion of 8 3 which 
reflects each vector across the line 

/-r» = 0 

Ui = 

(ii) Prove that B is orthogonal since it satisfies Definition 8.13. 

(iii) Find B 1 and describe the linear transformation it represents. 

4. Reason as follows to show that any linear rigid motion of the real plane 
is either a rotation, or a rotation followed by a reflection across an axis. 

(i) Write four quadratic conditions on the elements a, b , c, d of a 
real 2X2 matrix .1 that are necessary and sufficient that A be orthogonal. 

(ii) Show that the only such matrices are 

(.; :) “<■ c 

where or + b- = 1 . 

(iii) Show that the former is a rotation through the angle cos' 1 a, 
while the latter is a rotation followed by a reflection across an axis. 

f>. Complete the proof of the unitary case of Theorem 8.18. 

0. Prove the unitary case of Theorem 8.21. 

7. Prove that each characteristic value of an isometry satisfies |\| = 1, 
and that any two characteristic vectors which are associated with distinct 
characteristic values are orthogonal. What can you deduce about the Jordan 
form of any matrix that represents an isometry? 

8. Does the following matrix represent a rigid motion in 8 3 ? 

( i -r. v7o\ 

-j i v i7j 

Vio Vio 4 / 

9. Prove that if K is a real skew matrix and if I + K is nonsingular, then 

c I-K)(I + K )-' 

is orthogonal. 

10. Let a, denote the zth row of a real n X n matrix .4 ; let A A' = C = (c l} ). 
Prove the following: 

(i) c l} = p(a„ a } ), p an inner product; 

(ii) if the a, are mutually orthogonal, det C = [llaill ■ lla-JI • • ■ HaJI] 2 ; 

(iii) if |a,j| < k for i,j = 1,2, ... 9 n and some fixed number k, then 
lla.II < k Vn; 

(iv) if !a i; | < /r, and if the a, are mutually orthogonal, then 
(det A | = llailMMI- ••!!<* Jl < 
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§8.5. Hermiiian and Quadratic Functions 

Now we return to the study of conjugate bilinear functions, begun in 
§ 8.1. We consider a vector space V over the complex numbers and a conjugate 
bilinear function / from V X V to (B, 

fit v) ee; 

/ is linear in the first component and conjugate linear in the second. We also 
consider the case in which 1) is a vector space over the real numbers and / 
is a bilinear function from 1' X V to (R. Since conjugate bilinearity then 
reduces to bilinearity, wo shall first investigate the complex case, noting any 
differences which pertain to the real case. 

By considering only the numbers /(£, £) we obtain a function h, defined 
on *0 by 

h& = fit a 

Since / is conjugate symmetric, /*(£) is real for every vector £ of the complex 
space V. Indeed it is precisely this property that makes conjugate bilinearity, 
rather than linearity, a natural condition to impose for complex spaces. 
Relative to a basis {a h . . . , a n ) for V this function takes the form 

hit) = I Z /(«o = XAX*, 

i-l ;-l 

where £ = x x a t> and a tJ = f{a„ af), as in § 8.1. Thus h is represented 
by the Hermitian matrix A, which is uniquely determined by the choice of 
basis. From Theorem 8.8 we conclude that A and B both represent h relative 
to two bases if and only if A and B are conjunctive. 

A real valued function h, obtained in this manner from a conjugate bilinear 
function on a complex vector space, is called a Hermiiian function. Any 
expression of the form 

XAX* - ± ± a t jX&j 9 

i-l ;=1 

where A is Hermitian, is called a Ilcrmitian form l 

If we begin with a real vector space V and a bilinear function / from 
*0 X *0 to CR, then we define a function q from V to (R : 

qit) = fit f). 

Relative to a choice of basis {a h . . . , a n } this function takes the form 

9i f) = t t /(«*, *,)x& = XAX\ 

i-i j-i 

where A = (/(a„ a } )) is real and symmetric. Thus q is represented by a real 
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symmetric matrix, which is uniquely determined by the choice of basis. 
From Theorem 8.9, two such matrices represent q if and only if they are 
congruent. 

A function q } obtained in this manner from a bilinear function on a real 
vector space, is called a quadratic function. Any expression of the form 

XAX ' = S I a t ]X t Xj, 

t-l ;-l 

where A is real and symmetric, is called a quadratic form. 

This discussion can be summarized by the following theorems. 

Theorem 8.23. Let U n be a complex vector space, and let h be a 
Hermitian function on *U„. Relative to a basis {a h . . . , «„} for T)„; h is 
represented by a uniquely determined Hermitian matrix A, where if 
£ = x l a t , then /i(£) = XAX*. Relative to a basis {ft, . . . , ft}, h is 
represented by the Hermitian matrix B if and only if A and B are 
conjunctive over G. 

Theorem 8.24. Let U n be a real vector space, and let q be a quadratic 
function on T)„. Relative to a basis {a ly . . . , a n } for V n , q is represented by 
a uniquely determined real symmetric matrix A, where if £ = x x a t , 
then g(£) = XAX'. Relative to a basis {ft, . . . , ft}, q is represented by 
the real symmetric matrix B if and only if A and B are congruent over (R. 

Each quadratic (or Hermitian) function on a real (or complex) vector space 
*U determines a real symmetric (or Hermitian) matrix relative to a given 
basis, which in turn determines a quadratic (or Hermitian) form. We now 
consider the problem in reverse: starting with a quadratic (or Hermitian) 
form, can we find a quadratic (or Hermitian) function whose values are given 
by that form? To begin with, a quadratic form can be represented by various 
matrices. For example, 

3x? + 4xix 2 — x\ 

can be written as 



and so on. The variations occur from different decompositions of the co- 
efficient of XiXj into the two numbers a x j and a,-,. An easy resolution of the 
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ambiguity appears if we decompose A as the sum of its symmetric and skew- 
symmetric parts: 


Then 


A = S + K. 


XAX ' = X(S + K)X' = XSX' + XKX'\ 
but a simple calculation verifies that 


XKX r = 0. 


Hence only the symmetric component of A contributes to the value of g(£), 
and we lose nothing by insisting that the real form 


E E aula, 

be represented by the real symmetric matrix B = ( b X3 ), where 6 tl = a lt for 
i = 1, . . . , n and b tJ = b Jt = \(a tJ + a JX ). 

Correspondingly, if for given complex numbers a t) the form 


E E CL, } x t 2j 

»~i j- i 

is real for all complex vectors (x h . . . , x„), then the form can be represented 
by the Hermitian matrix B = ( b where b tt = a lt for i = 1 , . . . , n and 
bn = Ej. = + &ji)- (See Exercise 3.) 

Henceforth we shall assume that the forms we consider have been appro- 
priately symmetrized : 

XAX*, where a l3 = a 3l . 

This condition reduces in the real case to 

XAX where a tJ = a 3l . 

Then for a given basis {<*i, . . . , a n } for a complex vector space, the function 
A(£) = XAX*, where £ = is a Hermitian function whose values 

are specified by the given form. Similarly, in the real case, q(£) = XAX f is 
a quadratic function. 

Theorems 8.8 and 8.23 show that two Hermitian matrices represent the 
same Hermitian function if and only if they have the same rank and signature. 
Similarly, two real symmetric matrices represent the same quadratic form 
if and only if they have the same rank and signature. 


Definition 8.14. 

(a) The rank of a Hermitian function or of a Hermitian form is the rank 
of any Hermitian matrix which represents that function or form. 

(b) The rank of a quadratic function or of a quadratic form is the rank 
of any real symmetric matrix which represents that function or form. 
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(c) The signature of a Hermitian function or form is the signature of 
any Hermitian matrix which represents that function or form. 

(d) The signature of a quadratic function or form is the signature of 
any real symmetric matrix which represents that function or form. 

(e) A Hermitian (quadratic) function is said to be positive definite if and 
only if 

hit) > 0 for all £ ^ 0, 
iqit) > 0 for all £ ^ 0). 

Theorem 8.25. A Hermitian function ho n a coinp’ex vector space *U n 
is positive* definite if and only if it has rank n and signature n. 1 

p n oof: According to Theorems 8.(> and 8.8, h is represented relative 
to a suitable basis [<*i, . . . , a n } by the canonical form under conjunc- 
tivity, 

flit) = + ’ • ‘ + Zp?p ~ x P +iZ P \i - ■ • • — x r x n 

where s = 2p — r — p — (r — p). If r = n ~ s , then p = n and hit) > 0 
for all £ ^ 0. If r < n , then hia n ) = 0, and h is not positive definite. If 
8 < n = r, then hia n ) = — 1, and again h is not positive definite. 

Theorem 8.26. A quadratic function q on a real vector space is positive 
definite if and only if it has rank n and signature n. 

p k oof: Exercise. 

Theorem 8.27. A Hermitian matrix A represents a positive definite 
Hermitian function if and only if 

A = PP* 

for some nonsingular complex matrix P. 

p it o o f : A positive definite quad ratio function is represented in canon- 
ical form relative to conjunctivity by the identity matrix. Hence A 
represents that form if and only if 

A = PIP* 

for some nonsingular complex matrix P. 

Theorem 8.28. A real symmetric matrix A represents a positive definite 
quadratic function if and only if 

A = PP f 

for some nonsingular real matrix P . 
proof: Exercise. 
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Examples of Real Symmetric Quadratic Forms 

(a) The expression for the fundamental metric (element, of arc length) in 
three-dimensional space is 

Idx 1 + dif + dz 2 in rectangular coordinates, 
ds 2 = -j dr 2 + rW + dz 2 in cylindrical coordinates, 

l dp 2 + p- sin L ’ \l/dd- + p\I\p 2 in spherical coordinates. 

In the study of differential geometry the expression for the fundamental 
metric of a surface is of basic importance. 

(b) In classical mechanics the kinetic energy of a particle of mass m and 
having n degrees of freedom is given by 



where the x t are the position coordinates of the particle. In more general 
systems the kinetic energy is represented by more complicated <|uadra1ic 
forms. 

(c) The equation of a central quadric surface is 

ax- + bxy + nj 1 + dxz + eyz + fz ' 1 = k, 

the left hand member being a quadratic, form. The numbers <7, and i are 
zero only if the axes of the quadric surface coincide with the coordinate axes; 
in case these are not zero we are interested in finding an orthogonal change of 
coordinates which will simplify the form to a sum of squares. 

Given a Hermitian or quadratic function, it is natural to search for a 
coordinate system relative to which the form that represents that function 
is as simple as possible. r I he usual situation is that \\e aie given a llermitian 
or quadratic form, as in Example (c), and we wish to change coordinates in 
such a way that the form is reduct'd to a sum of squares. With physical and 
geometric applications in mind, we are particularly interested in using only 
rigid motions (isometries) to change coordinates. Our investigation of Her- 
mitian and quadratic forms reduces to a study of Hermitian and real sym- 
metric matrices; we shall see that the characteristic values and vectors of 
such matrices possess remarkable properties. 

Theorem 8.29. Let A be a matrix which is either Hermitian or real and 
symmetric. Every characteristic value of A is real. 

proof : By cither hypothesis, A = A*. If is a characteristic vector 
associated with the characteristic value X, then 

XA = XX 
XAX* = XXX*. 
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Clearly XX * is real and positive; also c = XAX* is real, since A is 
either Hermitian or real and symmetric. Hence X is the quotient of two 
real numbers. 

Theorem 8.29 implies that the characteristic polynomial, and therefore the 
minimal polynomial of A can be factored as a product of real linear factors. 

Theorem 8.30. Let A be a matrix which is either Hermitian or real and 
symmetric. If X\ and X 2 are characteristic vectors associated with dis- 
tinct characteristic values Xi and X 2 , then XiX 2 = 0. 

proof: Let X t A = X t X t , i =1,2. 

Then 

(XU)Xt = \iX,Xl 

Xx(AXi) = X^XtA*)* = X l (\*X l )* = XJTJTJ. 

Hence (Xi — X 2 )AYY5 = 0. Since Xi ^ X 2 , we conclude that X\X 2 = 0. 

Geometrically, this proves that in a Euclidean space the characteristic 
vectors of a real symmetric matrix are orthogonal whenever they are asso- 
ciated with distinct characteristic values, since X 1 X 2 = p(£i, f 2 ) relative to a 
normal orthogonal basis. Similarly, in a unitary space two characteristic vec- 
tors of a Hermitian matrix are orthogonal whenever they are associated with 
distinct characteristic values. 

The next result is the matrix form of the Principal Axes theorem which 
asserts that any quadratic form may be reduced to the sum of squares by an 
appropriate orthogonal transformation. In particular, the axes of any quadric 
surface are orthogonal, and a rigid motion of the axes of any rectangular 
coordinate system aligns the new coordinate axes with those of the quadric. 
The corresponding result for Hermitian forms is valid also. 

Theorem 8.31. (Principal Axes theorem.) Any real symmetric matrix A 
is simultaneously similar to and congruent to a diagonal matrix D; that 
is, there exists an orthogonal matrix I\ such that 

D = PAP~ l 

is diagonal, with the characteristic values of A along the diagonal. 

proof : Let {ai, . . . , a„} be a normal orthogonal basis for the Euclidean 
space V, and let T be the linear transformation represented by A in that 
basis. If { is any characteristic vector of T, then ll(ll“ 1 f is of unit length, 
characteristic, and associated with the same characteristic value as is £. 
Let ft = IIJII” 1 ?, where £ is characteristic, associated with Xi. Extend to 
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a normal orthogonal basis {ft, . . . , ft} for V. Relative to the new basis, 
T is represented by the matrix 

B = RAR - 1 = RAR' f 

where R represents an orthogonal change of basis and is therefore 
orthogonal. Thus B is real and symmetric, of the form 



where B\ is a real symmetric square matrix of dimension n — 1. We 
repeat the argument, selecting 72 as a normal characteristic vector in 
the space [ft, . . . , ft], letting 71 = ft. Since ft is orthogonal to 
[ft, • • • , ft], 7i and 72 form a normal orthogonal set that can be extended 
to a normal orthogonal basis {71, 72, , y n } for V . Then T is rep- 

resented by ( 7 , where 

C = SBS~ 1 = SBS', S orthogonal, 

( Xx 0 
0 X 2 

Z Cx 

Since R and £ are orthogonal, so is SR , and 

C = S(RAR')S' = ( SR)A(SR )'. 

Hence A is simultaneously similar and congruent to C; the theorem 
follows after n steps. 

Theorem 8.32. (Principal Axes theorem). Any Hermitian matrix A is 
simultaneously similar to and conjunctive to a real diagonal matrix D 
having the characteristic values of A as the diagonal elements: 

D = PAP* = PAP- 1 

for some unitary matrix P. 

proof : Exercise. 

Observe that Theorems 8.31 and 8.32 are strengthened versions of The- 
orems 8.5 and 8.4, respectively. The reduction of a Hermitian (or real sym- 
metric) matrix to diagonal form can be accomplished by means of unitary 
(or orthogonal) transformations. Indeed, it is clear that in selecting a basis 
in the proof of Theorem 8.32 (or 8.31) we could begin with a characteristic 
vector corresponding to a positive characteristic value (if such exists), and 
continue as long as such values remain; then we could select characteristic 
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vectors associated with negative characteristic values, as long as they remain, 
obtaining 



where Xi, . . . , X p are positive and X^+i, . . . , X r are negative. Clearly r is the 
rank and 2 p — r is the signature of A and of the Hermitian (or quadratic) 
form that A represents. D is similar and conjunctive (or congruent) to A. 
But this is as far as unitary (or orthogonal) transformations can be used in 
reducing A to canonical form relative to conjunctivity (or congruence) as 
described by Theorem 8.0 (or Theorem 8.7). Indeed, D is the Jordan form 
of A t the canonical form of A relative to similarity. The further changes 
needed to produce 1 and —1 in the nonzero diagonal positions correspond 
to a change of scale along each of the principal axes already obtained. 

Furthermore, we have obtained an alternative description of the rank and 
signature of a Hermitian (or real symmetric) matrix. As we already know, 
the rank of any matrix is the number of nonzero characteristic values, each 
counted according to its multiplicity as a zero of the characteristic polynomial. 
The signature of a Hermitian (or real symmetric) matrix is the number of 
positive characteristic values minus the number of negative characteristic 
values, again each counted according to its algebraic multiplicity. 

The actual process of reducing a quadratic form to a sum and difference of 
squares can be accomplished by the classical process of successively com- 
pleting squares. However, the Principal Axes theorem provides an alternative 
method, which we describe for quadratic forms although it is valid for Her- 
mitian forms as well. Given a quadratic form we write the real symmetric 
matrix A which that form defines. The characteristic values of A are real 
and allow us to write immediately a diagonal matrix that represents the 
form after an orthogonal change of coordinates. If we wish also to describe 
this change of coordinates explicitly, we obtain a complete set of mutually 
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orthogonal characteristic vectors fo, f 2 , normalized so that each is of 

unit length, expressed in terms of the original coordinates. The matrix I\ 
whose row vectors are &,..., is then orthogonal, and PA P* = D. The 
change of coordinates can be obtained explicitly from P. Of course the cal- 
culation of characteristic values involves finding the solutions of a polynomial 
of degree n; this is often difficult. An alternate method of reducing A to 
diagonal form is to use row and column operations as in Theorem 8.4; how- 
ever, the change of coordinates represented by such operations is not orthog- 
onal in general. 

As a final result, which is of importance in applications such as the solution 
of vibration problems in dynamics, we show that, given any two Iiermitian 
(or quadratic) forms, one of which is positive definite, there exists a single 
change of coordinates that diagonalizes both forms. This result is easily 
explained in terms of two central quadric surfaces in three-dimensional 
Euclidean space. Each such surface determines a quadratic form, and positive 
definite forms correspond to ellipsoids. Given a central ellipsoid and a second 
quadric surface, we can rotate axes to align the new coordinate axes with the 
axes of the ellipse. This transformation is distance-preserving. Then we change 
scale along the axes of the ellipsoid, deforming it into a sphere for which any 
direction is a principal axis. Then we again rotate axes to align the coordinate 
axes with the principal axes of the second quadric surface. Since the two 
surfaces have a common center, the new equation for each will be of the form 

( h/i + a 2 xl + atfl = a 4 , 
where a h a 2 and a 3 are =b 1 . 

Theorem 8.33. Let A and B be n X n Hermitian matrices. If A is 
positive definite, there exists a nonsingular complex matrix P such that 

PAP? = 7, 

PBP* = I), where I) is diagonal. 
proof : Since A is positive definite, there exists a nonsingular matrix Q 
such that 

QAQ* = 7. 

Then for any unitary matrix R , 

RQAQ*R* = RIR* = RR~' = 7. 

Since B is Hermitian, QBQ* is also Hermitian; so by Theorem 8.32 there 
exists a unitary matrix R such that 

RQBQ*R* = D, 

where D is diagonal. Let P = RQ. Then PAP * = 7, and PBP* = D. 
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Exercises 

1. Given the real quadratic form ax\ + 2bx ix 2 + cx f. 

(i) Prove that the form is positive definite if and only if a > 0 and 
b 2 — ac < 0. 

(ii) Show that the central conic ax 2 + 2bxy + cy 2 = 1 is an ellipse or 
hyperbola, depending upon whether the quadratic form on the left has r = 2 
and s = 2, or r = 2 and s = 0. 

2. Represent each of the following quadratic forms by a real symmetric 
matrix, and determine the rank and signature of each. 

(i) x\ — 2xiX 3 + 2xl + 4x 2 x 3 + 6x1. \ 

(ii) 16 xiX 2 — x 3 . 

(iii) 3xJ + 4xix 2 + 8x1X3 + 4x 2 x 3 + 3x3. 

3. Let A be a complex n X n matrix with the property that for every 
complex row vector X y XAX* is real. Let A = H + K be the decomposition 
of A as the sum of a Hermitian matrix H and a skew-Hermitian matrix K . 
Prove that XAX* = XHX*. 

4. Prove Theorems 8.26 and 8.28. 

5. Prove Theorem 8.32. 

6. The Taylor expansion of a function / of two variables at (a, b ) is ex- 
pressed in terms of the partial derivatives of f by 

/(a + h t b + /c) = /(a, b) + hf x (a , b) + /c/ v (a, b) 

+ b[h 2 fxx(a, b) + 2hkf xy (a } b) + k 2 fy U (a y 6)] + • • •, 

provided that/** = ^ (^) = £ (^) = /„*. If /,(o, b) = 0 = /„(«, b), then 

(a, b) is a critical point for maximum or minimum. The term in brackets is 
a quadratic form in h and k\ if the form has rank 2, it determines whether/ 
has a maximum or minimum or neither at (a, b). Assuming r = 2, show that 

(i) /(a, b) is a relative maximum if $ = —2, 

(ii) /(a, b) is a relative minimum if s = 2, 

(iii) /(a, b) is neither maximum nor minimum otherwise. 

7. As we know, any quadratic form in three variables can be reduced by 
an orthogonal change of coordinates to the form 

Xixf + X 2 xl + X3X3, 

where the X< are real. Hence any centrally symmetric quadric surface has an 
equation of the form 

Xixf “|“ X 2 x 2 "h X3X3 = 1 

in a suitable rectangular coordinate system. Use the rank and signature of 



[ § 8.6 ] 


Normal Matrices and Transformations 211 


the quadratic form to classify all possible types of centrally symmetric 
quadric surfaces, and identify each type by means of a sketch. 

8. Use the following chain of reasoning to prove Hadamard’s inequality: 
If A is a real n X n matrix such that |a tJ | < k for all i, j , then 

|det A | < k n n u/2 . 

(A special case of this result was derived in Exercise 10, § 8.4.) 

(i) Let Bq = AA f ; then B 0 is real and symmetric. 

(ii) If A is nonsingular, B 0 is positive definite. 

(iii) If B is any n X n real, symmetric, positive definite matrix, if 
x h . . . , x n are any nonzero numbers, and if c tJ = b lJ x l x Jl then C is symmetric 
and positive definite. 

(iv) If C is any n X n real, symmetric, positive definite matrix, then 


det C < 



(v) Let C be defined as in (iii), 'using B = B 0 and x t = bu ,/2 . Show 
that tr C = n, and det B = det C • Ilf. x b„. 

(vi) Complete the proof of Hadamard’s inequality. 

9. Use Hadamard’s inequality to show that if A is a real n X n matrix 
such that |a tJ | < k for all i, j , and if the characteristic polynomial of A is 

("“l)”[^ n + Ci\ n ~ l + ••• +c n ], 

then 

C,| < for j 1, 2, . . . , n. 


§8.6. Normal Matrices and Transformations 

In the last section we saw that each Hermitian matrix and each real sym- 
metric matrix is diagonable, and furthermore that the diagonalization can 
be performed by an isometry, a change of coordinates that corresponds to a 
rigid motion of the coordinate axes. When we first discussed diagonalization 
generally, no inner product had been defined on the underlying vector space, 
so we were not concerned with the type of transformation used to diagonalize 
a matrix, except that it had to be nonsingular. In this section we shall re- 
consider the diagonalization problem, regarding a given matrix as representing 
a linear transformation on an inner product space and restricting our atten- 
tion to isometric transformations. 

We recall that an isometry T is called unitary or orthogonal, according to 
whether the underlying vector space is complex or real; relative to a normal 
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orthogonal basis, T is represented by a matrix P, which is unitary or orthog- 
onal, respectively. A unitary matrix is characterized by the equation P* = P“ l , 
an orthogonal matrix by P' = P 1 . Since the former condition reduces to the 
latter when A is real, we shall use the term isometric to describe a matrix 
that is either unitary or orthogonal; that is, any matrix P for which P* = P -1 . 
We shall say that A is isometrically diagonable if and only if PAP ~ l is diagonal 
for some isometric matrix 1\ 

If A is isometrically diagonable, then for some isometric matrix P and 
some diagonal matrix D, we have 

1) = PAP- 1 = PAP* 

D* = T) = PA*P* = PA*P~ l 
DD* = (PAP-'){PA*P*) = PAA*P* 

D*l) = ( PA*P-')(PAP *) = PA*AP * 

Since diagonal matrices commute and P is nonsingular, we conclude that 

AA* = A *A . 

Hence, a necessary condition that A be isometrically diagonable is that A 
and A* commute. Our next objective is to prove that this condition is also 
sufficient. 

Definition 8.15, A matrix A of complex numbers is said to be normal 
if and only if 

AA* = A*A. 

Since any real number is also a complex number, this definition also applies 
to a real matrix, in which case AA' — A' A. 

Theorem 8.34. Let A be any n X n matrix and P any n X n isometric 
matrix. Then A is normal if and only if PAP* is normal. 

proof : Exercise. 

The fact that normality is preserved under isometric changes of bases 
suggests a way to proceed. We first show that any matrix may be reduced to 
lower triangular form by means of an isometry. Then we show that any 
lower triangular matrix which is normal must be diagonal. 

Theorem 8.35. If A is any n X n matrix, there exists a unitary matrix 
U such that UAU* is lower triangular. 

proof: Choose any normal orthogonal basis for complex V n , and let 
T be the linear transformation represented by A relative to that basis. 
Let Xi be a characteristic value of T, and 71 a corresponding characteristic 
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vector of unit length. Extend to a normal orthogonal basis {7^ 0 2 , . . . , 0 n } 
for V n - Then T is represented relative to the new basis by a matrix 
B = P\AP\ t where Pi is unitary and where B has the block form 



If n — 1 or 2, B is lower triangular, and the theorem is proved. For 
n > 2, we proceed by induction, assuming that the theorem is valid for 
all (n — 1) X (n — 1) matrices. Thus there exists a unitary matrix Q 
such that QB a Q* is lower triangular. Let P> be defined by the block form 



Since Q is unitary, the row vectors of P> are orthogonal and of unit 
length. Hence P 2 is unitary, and by direct calculation 


P 2 BP$ = 


( 


Xl 

QTh 


Z \ 

qb a q*)> 


which is lower triangular. Let U — P 2 P\. Then V is unitary, and UAU* 
is lower triangular. 


The reason for using unitary transformations and complex V n in Theorem 
8.35 was necessity rather than convenience: a real matrix need not have any 
real characteristic values. If a real matrix A is orthogonally similar over the 
real numbers to a lower triangular matrix B , the diagonal elements of B are 
real, and these are the characteristic values of B and of A . Conversely, if all 
the characteristic values of A are real, then the proof of Theorem 8.35 can 
be adapted to show that PAP ' is lower triangular for some real orthogonal 
matrix P. 


Theorem 8-36. An n X n matrix A is normal jf and only if there exists 
a unitary matrix U such that UAU * is diagonal. 

phoo f : The argument that precedes Definition 8.1 r ) proves the “if” 
part. Suppose A is normal. Then by Theorems 8.31 and 8.35, for some 
unitary matrix U f UAU * is lower triangular and normal: 



Equating the expressions for the (i, j) elements of BB* and B*B, we have 


21 btkbjk = S 
*-1 k»l 
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But fc r , = 0 whenever r < s. For i = j = 1, we have 

&ii5n = bnhn + &2i52i 4“ * * * 4“ frni5«i- 

Hence b r i = 0 whenever r > 1. Then for i = j = 2, we have 
622 E 22 “ ^ 22^22 4" ^32^32 4“ * * * 4" ^712^2* 

Hence b r 2 = 0 whenever r > 2. Continuing in this way, we have b ra = 0 
whenever r > s. Hence B is diagonal. 


As a final consideration let us investigate what A* means when interpreted 
as a linear transformation on an inner product space. That is, given a linear 
transformation T on a finite-dimensional inner product space V, T is rep- 
resented relative to a normal orthogonal basis by a matrix A, for which A* 
is easily calculated. Relative to the same basis, A* represents a linear trans- 
formation on *U, say T*. How are T and T* related? 

Let £ = 21?. 1 x t a x and rj = We also have 


ol x T = E T* = E 

j-i i-i 

Then 

P(£ T, 17) = P ( E ^E a„ 2/*“*^, 
= E (e x.a.y^ p jE y^, 

= E (e y„ 

= E *. ( E a,#A 

.-1 \j-i / 

n n / n \ 

“ IE E a uVjV ( E XkOtfcj <*i )» 

t-iy-i \*»i / 

n / n n \ 

= EP E E ), 

i-l \*-l ; - 1 / 

= P ( t E **«*, E y> (.E 

= P E 

= Ptt, »T*). 

Hence for all i; e V, T* satisfies the property 


P(£, i?T*) = p(£T, ij). 

But from Exercise 3, § 8.2 we recall that if jSi and ft are vectors such that 
P((i ft) = p({, ft) for all {et, then ft = ft. This means that a vector is 
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uniquely determined by specifying the value of its inner product with each 
vector of the space. Thus T* is uniquely defined on 13 by T and the given 
inner product. 

Definition 8.16. The adjoint of a linear transformation T on an inner 
product space 13 is the mapping T* of 13 into 13 which is defined by the 
equation 

P& rfn = P({T, V) for all £, v e 13. 

As an exercise you may verify that T* is linear; that is, for all £, 771 , rj 2 G 13 
and all scalars a, b, 

p({, (a Vl + b V2 ) T*) = p(£, a( m T*) + b(*T*)). 

Theorem 8.37. Relative to a given normal orthogonal basis, if T is 
represented by a matrix A and T* by a matrix B, then B = A*. 

proof : Exercise. 

Following the terminology introduced for matrices, a normal transformation 
is one that commutes with its adjoint, TT* = T*T. This class of transforma- 
tions is of particular interest because Theorem 8.30 guarantees that a normal 
transformation decomposes the underlying space into a direct sum of the 
characteristic subspaces of T that furthermore are mutually orthogonal. This 
implies that if T is normal and if {1 and £2 are characteristic vectors associated 
with distinct characteristic values, then p(£ 1 , £ 2 ) = 0. An important subclass 
of normal transformations are self-adjoint transformations, T = T*. Since the 
characteristic values of T* are the complex conjugates of the characteristic 
values of T, the characteristic values of a self-adjoint transformation are 
real. Self-adjoint transformations are called Hermitian in the complex case, 
symmetric in the real case. 

If we consider the real case, another question comes to mind. If T is rep- 
resented by A y then T*, a transformation on 13, is represented by A'. But in 
Theorem 4.1 we saw that A' represents a mapping T' of the dual space 13', 
relative to the dual basis. How are T* and T' related? Even for complex 
inner product spaces it makes sense to ask whether there is a natural relation 
between the adjoint T* and the transpose T' of T, so we shall investigate 
the question in that form. 

Let 13 be a finite-dimensional inner product space over the complex num- 
bers, 13' the dual space of all linear mappings from 13 to 6. With each f e 13' 
we associate the vector fa s 13, defined by 

p(£, fa) = £f for every £ € 13. 

As we observed previously, fa is uniquely determined because the value of 
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its inner product with each vector has been specified. This mapping <t> from 
1)' to 1) is one-to-one, for if = </>„ then £f = £g for all £ G D, and f = g. 
Conversely, to each r\ e V we can associate the function from V into C, 
defined by 

th = p(£> v), 

which is linear because p is linear in its first component. Then G 1)', so 
\p is a mapping from V to 1)'; furthermore 

?(£, = £^ = p(t, *?)• 

Hence = 17, and </> is onto D. Clearly ^ = tfr 1 . 

It is a matter of direct computation to verify that \ 

0f+« = <#>f + <£*, 

0af = 

since p is conjugate linear in the second component. Hence if *0 is a real 
vector space, ^ is a vector space isomorphism from V' onto e U, but if V is 
complex, 0 is one-to-one onto V but not an isomorphism. 

Now we are ready to compare T' as a transformation of V' with T* as a 
transformation of V. These mappings were defined by the respective equations 

£(/T') = (£T)f for all £ G V, f G V', 

p(£, *?T*) = p(£T, tj) for all £, v e U 

What we will show is that these two mappings preserve the one-to-one cor- 
respondence <£, which exists between the spaces V and V': = <t>rr for 

all f G *0'. To see this we let £ e V and f e D'; from the defining equations 
for <t > , T', and T* we have 

£(/T') = p(£, tf>rr), 

= (£T)f, 

= p(£T, <t> f), 

= p(£, 0fT*). 

Again using the fact that a vector is uniquely determined by values of its 
inner product with every vector of the space, we conclude that <frT* = <£rr« 
Hence for a finite-dimensional real inner product space the isomorphism 
between V and its dual space D' permits us to regard T' either as a linear 
transformation on *1)' or as a linear transformation on *U, with the assurance 
that each pair of corresponding vectors in the two spaces is mapped by T' 
into a pair of corresponding vectors. 

As a further result in this connection, Exercise 8 demonstrates that the 
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mapping <j> can be used to obtain an inner product p* for 13' in terms of a 
given inner product p for 13 by defining 

p*(f, g) = p(0 B , 4> { ). 

The reversal of the two components introduces the conjugate that is needed 
in the complex case to compensate for the fact that <f> fails to be an isomor- 
phism only because 

<t>a( = a<t> f. 


Exercises 

1. Prove Theorem 8.34. 

2. Let A be a real matrix for which every characteristic value is real. 
Deduce that A is orthogonally diagonable if and only if /I is normal. 

3. Prove that an n X n matrix A is normal if and only if there exists a 
set of n mutually orthogonal characteristic vectors. 

4. To illustrate Exercises 2 and 3, refer to the matrix A given in Exercise 
l(i), § 7.5. Show that the characteristic values of A are real, that A is diag- 
onable, but that A is not orthogonally diagonable. 

5. Prove that T* is linear. 

0. Prove Theorem 8.37. 

7. Prove the following assertions about a normal transformation T and 
its adjoint. 

(i) llfTH = ll£T*ll for every £ e 13. 

(ii) X is a characteristic value of T if and only if X is a characteristic 
value of T*. 

(iii) { is a characteristic vector of T if and only if f is a characteristic 
vector of T*. 

(iv) If £i and { 2 are characteristic vectors of T associated with distinct 
characteristic values, then £i and £ 2 are orthogonal. 

8. Let *0 be a finite-dimensional complex inner product space, 13' its dual 
space, and $ the one-to-one mapping of 13' onto 13 defined in the text. Let 
p* be the mapping of 13' X 13 into C defined by 

P*(f, g) = P(« B , 4>i)- 

Prove that p* is an inner product on 13'. 

9. Carry out the details of the following derivation of the polar decomposi- 
tion of a nonsingular matrix A as the product of a positive definite Hermitian 
matrix H and an isometric matrix Q. 
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(i) A A* is Hermitian. 

(ii) For some isometry P, PAA*P * = D is diagonal with a positive 
real number in each diagonal position. 

(iii) There exists a diagonal matrix E for which E* = D and e„ > 0 
for every i. 

(iv) Let H = P*EP ; then H is Hermitian and positive definite. 

(v) Let Q = H(A*)~ U , then Q is an isometry. 

(vi) A = HQ. 



CHAPTER 9 


Combinatorial 

Equivalence 


§ 9 . 1 . Preliminary Remarks 

Up to this point most of the material we have considered is central to 
a general study of matrices. This is not to claim that we have investigated 
every central idea, nor that every previous result is basic for any given appli- 
cation of matrix theory. Certainly only the surface has been scratched on 
some important topics, and a considerable body of material remains to be 
investigated. But almost any introductory course in linear algebra will be 
concerned with the material of Chapters 2 to 8, perhaps with a different order- 
ing of these topics and with variations of the degree of generality with which 
each is considered. 

The question of what topics should be studied next would receive different 
answers according to the interests of the individual. For prospective mathe- 
maticians there are many important applications to geometry, analysis, prob- 
ability theory, and algebra. The generalization to infinite-dimensional vector 
spaces is of special interest and importance. The physicist may be more 
interested in applications to Newtonian mechanics, quantum mechanics, and 
relativity, the chemist to crystal structure or spectroscopy. The engineer 
will find applications to elasticity, electrical networks, wave propagation, and 
aircraft flutter. The economist might prefer to learn how to apply matrices 
to linear programming and game theory in order to solve problems in trans- 
portation, logistics, communications, and assignments. The biologist would 
wish to relate matrix theory to genetics, the psychologist to theory of learning 
or to dominance relations, and the sociologist to group relations and social 
customs. 
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Such applications of matrix theory are of genuine interest, not only for their 
effectiveness in solving significant problems of social and scientific import, but 
also because applications stimulate the development of new knowledge about 
matrices. Thus there is a strong temptation to discuss a number of these 
applications in the remainder of this book. Several considerations militate 
against this. First, the fields of application are sufficiently technical that a 
brief account of the background material would be necessarily fragmentary 
and perhaps superficial. Second, the range of applications is so broad that 
the selection of only a few topics would be cither biased or capricious. 

Therefore you are strongly encouraged to consult other sources for infor- 
mation on applications of matrix theory. In particular, Chapters b and 7 of 
Reference 29 contain an introductory exposition of several of these topics, 
and References 0 and 2b contain extensive references to applications, listed ' 
according to subject matter. 

In the present chapter we shall develop some of the mathematics that is 
used to formulate and to solve a linear program. In the final chapter we shall 
draw upon the content of normal undergraduate work in function theory, 
extending to matrices the concepts of sequences, series, and function^. 


§9.2. Linear Inequalities 

As a generalization of the systems of linear equations studied in Chapter 5, 
we now consider a system of linear inequalities of the form 

Gll-Tl + a n x 2 + • ■ + OinJn > l) { 

(I 21 X 1 + ci 22 x 2 + • • * + a 2n Xn > b 2 

(9.1) 

@m\Xl “I” O-mlX o *4“ ’ * * “l” d mn X n ^ 

where each a l} and each b k is real and where A” = (x Xf . . . , x n ) is a vector 
in Euclidean n-space 8„. Each expression 

a . 12*1 + a ^2 + • • • + a in x n 

defines a linear functional on and we previously observed that the set 
of all X G 8„, for which 

a t iX\ + a,ox 2 + • • • + d in x n = b t 

is a hyper plane in a translation of an ( n — l)-dimensional subspace. The 
set of all X for which 


0,13-1 + a , 2*2 + ■ • • + a in x n > b t 
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is called a closed half-space because it is the set of all points in C n that lie 
either on a hyperplane or on one side of that hyperplane. Hence the solution 
of (9.1) consists of all points of £„ which arc in the intersection of the m half- 
spaces determined by the given inequalities. Clearly the solution can be the 
void set, a single point, or an infinite set. It seems geometrically obvious, 
and is not difficult to prove, that the solution set must be convex , meaning 
that whenever two points lie in the set, the line segment joining them also 
lies in the set. 

As a two-dimensional example consider the system 

4j + 3y < 12 
x - 2y < 0 
2x — y > —4. 

The solution set consists of all points in the triangular region sketched. In 
the example the third inequality relation is opposite in sense to the first two, 



but this can be remedied by multiplying the first two inequalities by —1 to 
obtain a system in the form (9.1). Indeed any system of linear inequalities 
can be converted to the form (9.1). 

Now let (u\, u 2f ... , u n ) be any solution of (9.1); define the ra-component 
vector (i;i, i> 2 , . . . , v m ) by 

v t (a x iu x H - * * * “b a in u n ) b if for i = 1, . . . , m f 
and observe that 

v % > 0 for each i = 1, 2 , . . . , m. 

Hence each solution of the system (9.1) of m linear inequalities in n variables 
determines a solution of the system of m linear equations in m + n variables, 
given by 
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— Vi 


+ a n xi + • ■ • + a in x n = bi 


{ 9 . 2 ) 


Vm 4" ®ml^l 4“ ’ ' * 4“ Q,mn%n ^mj 

and furthermore the values of v t for that solution are nonnegative . Conversely, 
any solution of {9.2) for which the v t are nonnegative determines a solution 
of {9.1). This observation allows us to treat the problem of solving a system 
of linear inequalities by the techniques used for solving systems of linear 
equations, with the added restriction that certain of the variables must be 
nonnegative. 

As in § 5.5, {9.2) can be expressed in matrix notation, 


<— (£) - B. 


Here I is m X m, A is m X n, V is m X 1, X is n X 1, and B is m X 1. The 
condition that no component of V be negative can be expressed 

V > Z. 

An inequality relation for vectors has not been defined previously; in general, 
for two real n-component vectors we write 

X > Y 

if and only if each component of X is as large as the corresponding component 
of Y: 

> Vi for t = 1, 2, . . . , n. 

This relation is reflexive and transitive, but unlike the ordering relation for 
real numbers, it does not follow that for any two n-component vectors either 
X > Y or Y > X f since some but not all components of X might exceed the 
corresponding component of Y. Such a relation is called a partial ordering. 
Our discussion can be summarized as follows. 


Theorem 9.1. A system AX > B of real linear inequalities has a solu- 
tion Xq if and only if there exists a vector V 0 > Z such that 


(-/w (£) - B. 


There is another interpretation of {9.1) which is sometimes helpful. The 
left-hand side of each inequality is simply the inner product of X and a row 
vector of A, relative to a normal orthogonal basis for S n . A solution X 0 is 
simply a vector whose inner product with each of n given row vectors A< 
exceeds a given value b % . 
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Exercises 

1. Sketch the graphs of each of the following systems of linear inequalities, 
indicating the solution set. 

(i) xi - 2x 2 > 4 
xi + x 2 < 7 

x 2 > 0 

(ii) Xi — 2t 2 > 4 
xi+ x 2 > 7 

*i>0 

x 2 > 0 

(iii) 3-1 - 2x 2 < 4 
xi + x 2 < 7 

2ji — x 2 < — 8 

xi > 0 

x 2 > 0 

2. Determine the maximum and minimum of the function 3xi + 2x 2 on 
each of the solution sets of Exercise 1. 

3. In S n the line segment joining two points $ and rj is defined to be the 
set of all vectors f of the form 

r = H + (1 - k) V} 0 < k < 1. 

A subset C of 6 n is said tc be convex if and only if each point of the line segment 
joining any two points of C also belongs to C. 

(i) Prove that any half-space, defined in 6 n by a linear inequality, 
is convex. 

(ii) Prove that the intersection of two convex sets is convex. 

(iii) Deduce that the solution set of a system of linear inequalities is 
convex. 

4. Given a closed convex subset C of S n , and a linear form / defined for 
each { = (ji, . . . , x„) by 

/(£) = CiXi + c 2 x 2 + • ■ • + c n x n . 

(i) Show that / is a monotone function along any line segment in S n . 

(ii) Show that if / assumes a maximal or minimal value on C, then 
that extremal value must occur on the boundary of C. 

(iii) Deduce that if C is defined by a system of linear inequalities, then 
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any extremal value of / must occur at a “corner” point, the intersection of 
two or more half-spaces that bound C. 


§9.3. Linear Programming 

In this section we shall see how systems of linear inequalities arise in a 
natural way in problems of economic interest. As usual, however, the mathe- 
matical formulation of such problems and the methods of solution also apply 
to problems which are far removed from economic motivation. We begin 
with a simple example. 

A metal processor wishes to produce an alloy of tin and lead, containing 
at least 60 percent lead and at least 35 percent tin. He can purchase four 
different alloys having the percentage compositions and hundredweight prices 
shown in the table below. 


ALLOYS 



At 

a 2 


^4 

Desired 

Lead 

40 

60 

80 

70 

> 60 

Tin 

60 

40 

20 

30 

> 35 

Costs 

240 

180 

160 

210 

Minimal 


How should the processor blend the alloys in order to minimize his costs? 
Suppose that j*i, . . . , x 4 represent the proportions of the corresponding alloys 
in any hundredweight mixture. The explicit conditions of the problem m*iy 
be expressed as a system of linear inequalities of the form (9.1): 

Lead: 40xi + G0 j 2 + 80j 3 + 70x 4 > 60 

Tin: 00x x + 40x 2 + 20x 3 + 30x 4 > 35 

Cost: 240xi + 180 x 2 + 160 j 3 + 210x 4 = C. 

But there are also some intrinsic conditions, namely that each Xi be non - 
negative and 

Xi + x 2 + Xz + Xi = 1 . 

The problem is to determine the j, such that all conditions are satisfied and 
the value of C is minimized. 

The definition of a primal linear programming problem is a direct generaliza- 
tion of this example. Given a system of m linear inequalities in n variables 
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CL\\X\ + « 12^2 + • * • + Q\nZn > 

021^1 + ^ 22^2 + • * ’ + a '2nX n > &2 

a m \X\ + a m 2J2 + * * * + dmnXn > £>m, 

the condition that each variable be nonnegative, 

Ji > 0 

x 2 > 0 


*n>0, 

and a linear form 

v = C\Xi + c 2 x 2 + • • ■ + c n x n ; 

determine, from among all n-tuples (xi, . . . , x n ) that satisfy all of the specified 
restrictions, one which makes the value of u as small as possible. 

Stated in matrix form, a primal linear programming problem becomes this: 
given a real m X n matrix A , a real m X 1 vector B, and a real n X 1 vec- 
tor C; determine a real n X 1 vector X for which 

AX > B 
X>Z, 

and such that the linear form 

CX 

is minimized. 

It is also of interest to state the problem in vector space terminology. Of 
course A determines a linear transformation and X a vector. In order to 
convert to right-hand notation, as in § 5.1, it is necessary to take transposes: 
X'A' > B'. The transformation T corresponding to A f maps £ a info £ m ; 
X ' corresponds to a vector in £ n and B ' to a vector in £ w . The vector C 
determines a linear form on £ n ; hence we let C f correspond to a vector in the 
dual space £*. Thus the primal linear programming problem may be stated 
as follows: given a linear transformation T from £„ to £ m , a vector (3 € £«, 
and a linear functional c e £^, determine a vector { g£„ such that 

£T> 0 
£ > 9 , 

and such that the value of the linear functional 

£c 

is minimized. 

The introduction of the dual space suggests a symmetry in the linear 
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programming problem which is of great importance. We recall that the 
transpose T of T is a mapping from 6 ^ to S» defined by the equation 

a(fT') = (crT)f for all a G 8 n and all f G 
Hence the situation is as indicated in Figure 9.2. 



c 


Figure 9.2 

The symmetry makes clear that a linear programming problem defined by 
T, 0, and c determines a similar problem defined by T', c, and 0. The sym- 
metry becomes even more apparent if we alter notation so that the elements 
of the dual space are written as vectors with Greek letters; for example, 
write y for c. Since any vector is also a linear functional on its dual space 
we shall write a Greek letter in bold face if we wish to emphasize its role as 
a linear functional. 

Now choose any £ € 8 „ and any r\ G £>’ m such that £ > 6 and rj > 6. If 
£T > 0 } then 

Pi < (fT)n = £(VT). 

If we further require that tjT' < y, then we have 

SWP) < £ 7 . 

Since the values of 0 r\ and £7 are obtained as the scalar product of two 
ra-tuples and two n-tuplcs respectively, we therefore conclude that 

P-rj < 7 •£. 

Hence any rj > $ which satisfies 77T' < 7 determines with 0 a lower bound 
of the function which the primal problem seeks to minimize. If we maximize 
0 -vj for all such rj we still have a lower bound, 

max < 7 *{, 

valid for all £ which satisfy £ > 6 and £T > 0. Hence we conclude that 

msix(0‘ri) < min( 7 -£) 
as £ is varied over all such vectors. 

We recall from § 8.5 that in the real case T' = T* and therefore T' can 
be regarded as a mapping from £ m to S n without referring to the dual space; 
with 0 and rj regarded as vectors in the same space the calculation of 0-i j is 
fully justified. Now it is easy to state what is called the dual linear program- 
ming problem; for comparison we repeat the statement of the primal problem. 
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Given fixed vectors 0 G 8 m and y G S n and given a fixed linear trans- 
formation T from 8„ to 8 m : 

Primal problem . Determine ( e8„ such that fT > 0, £ > 0, and 7 * £ is 
minimal. 

Dual problem . Determine ij G 8m such that 7jT' < y, y > 0 and 0 -i? is 
maximal. 

Stated in matrix notation, the primal and dual problems of linear program- 
ming assume this form: Given a real m X n matrix A , and column vectors B 
and C of dimension m and n, respectively ; 

Primal problem. Determine a nonnegative column vector X of dimension n 
such that 

AX > B, 

CX is minimal. 

Dual problem. Determine a nonnegative column vector Y of dimension m 
such that 

A'Y < C, 

BY is maximal. 

The information of both problems can be displayed in a tabular form 
introduced by A. W. Tucker. 


Primal 


0 < 

*1 

*r 2 

■ ■ ■ X n 

> 

2/i 

an 

an 

* * ‘ Gin 

i 61 

2/2 

U21 

022 

’ • * G2n 

i h 

1 

1 

r 

1 

Vm 

a m 1 

dm 2 

* * * dmn 

| b m 





| 

1 

Cl 

C2 

* * " Cm 

mAX 

: . \ 
i mm V 

1 \ 


For the primal problem the inequalities are read by taking the inner product 
of the top row with each interior row of the table; the linear form to be 
minimized is the inner product of the top and bottom rows. For the dual 
problem columns are used instead of rows. 

In order to state the major theorems of linear programming, some additional 
terminology is convenient. A vector { € 8 n is called feasible for the primal 
problem if and only if {T > 0 and { > 6. A vector rj g 8 m is called feasible 
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for the dual problem if and only if t?T' < y and y > 6 . An optimal vector 
for the primal problem is a primal-feasible vector £ 0 for which 7-£o is minimal. 
An optimal vector for the dual problem is a dual-feasible vector rf 0 for which 
is maximal. 

The fundamental inequality which we derived previously can be stated 
thus: if £ is primal-feasible and if 77 is dual-feasible, then 

max (/3- 77) < min(7-£). 

It follows immediately that, if £ 0 and 170 are feasible vectors for which 

P-no = 7 * (0, 

then both £ 0 and rj 0 are optimal. Furthermore, it can be proved that a fea- 
sible £0 is optimal only if t hat equality holds for some feasible 170. This result 
is known as the duality theorem of linear programming. Of course the existence 
of optimal, or even feasible, vectors is not guaranteed. The fundamental 
existence theorem of linear programming asserts that a necessary and sufficient 
condition for either the primal or dual problem to have a solution (that is, 
an optimal vector) is that both problems have a feasible vector. 

Even when the existence of a solution is guaranteed, an effective method 
of finding an optimal vector is needed to make linear programming a practical 
tool for decision making. Tn the next section we briefly discuss the simplex 
method and examine some of the matrix theory that underlies the method. 
For a full account of the theory and techniques of linear programming you 
are urged to consult Reference 22 and the excellent bibliography contained 
therein. 


Exercises 

1. (Jiven a general primal linear programming problem as first stated in 
the text, write a corresponding statement of the dual problem. Do not use 
matrix or vector notation to abbreviate this statement. 

2. A metal craftsman has on hand a supply of 30 ounces of gold and 
GO ounces of silver from which to make gold rings, silver and gold pins, and 
silver earrings. Each ring uses \ ounce of gold, each pin uses \ ounce of gold 
and J ounce of silver, and each pair of earrings uses one ounce of silver. He 
can make a profit of $5 for each ring, $4 for each pin, and $3 for each pair 
of earrings, and he seeks to maximize his profit. 

(i) Write this information in the form of a (dual) linear programming 
problem. 

(ii) Write the inequalities of the corresponding primal problem. 

(iii) Sketch a graph of the inequalities of the primal problem, deter- 
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mine an optimal solution, and use it to evaluate the minimum of the objective 
function of the primal problem. 

(iv) Use the duality theorem to find (by inspection) a solution of the 
craftsman’s problem. 

3. Refer to the text example of the metal alloys, ignoring for the moment 
the condition that x x + x 2 + x^ + x* = 1. 

(i) State and sketch the graphs of four linear inequalities in y x and y 2 
which, together with the linear form 60t/i + 35^/2, describe the dual of the 
modified primal problem. 

(ii) Find an optimal solution of (i) and observe that 168 is the max- 
imum value of the given linear form. At the same point, the maximum value 
of 65i/i + 35i/2 is 175. 

(iii) Find a feasible solution of the original alloy problem, for which 
the cost function is 175, and explain why this solution is optimal. 


§9.4. Combinatorial Equivalence 

A very effective algorithm for solving any linear programming problem 
has been developed by G. B. Dantzig; it is called the simplex method. We 
shall describe the method generally but not go into details, which are fully 
covered in Reference 22. The first step is to convert the system of linear 
inequalities into the standard form 

in which all components of V x , X x and B x are nonnegative. The simplex 
algorithm then proceeds in two stages. In the first stage pivot methods are 
used to exchange the v t for some of the x Jf leading either to the determination 
of a feasible vector or to a proof that no feasible vector exists. If a feasible 
vector exists, the second stage operates with pivots on a modified version 
of the system of equations, finally producing either an optimal vector or a 
proof that no optimal vector exists. 

Our objective here is to study the pivot operation on matrices and to 
develop an associated concept of combinatorial equivalcnce } first introduced 
by A. W. Tucker. (See Reference 38.) In contrast' to other equivalence rela- 
tions we have studied, each m X n matrix is combi natorially equivalent to 
only a finite number of matrices. Starting with a linear program defined by 
a given matrix, the simplex method examines the finite set of matrices which 
are combinatorialiy equivalent to it, providing after each pivot a means of 
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deciding whether a solution has been achieved, and if not, indicating which 
matrix should be examined next. 

From § 5.5 we recall that if A is a given matrix for which a ti ^ 0, then a 
pivot operation on a l3 results in a matrix which we shall denote by A J, where 

fdn di2 — d\j • • • d\ n 

AZ = afj'\ a, i a, 2 1 ••• a,„ 

W m i d m 2 Q>mj ‘ * ’ dmn 

and where for r and s ^ j, 

d r8 == drs^ij 0/rjd ls . 

From our earlier analysis of systems of linear equations we know that a 
pivot operation always replaces a given system by a system having the same 
solution. Since the rows of A correspond to the equations of the system, a 
permutation of rows corresponds merely to a change in the order in which 
the equations are written. Each column specifies the coefficients of some x } \ 
hence a column permutation on A simply permutes the variables . . . , x n . 
Hence any finite sequence of pivots, row permutations, and column permuta- 
tions leads to a system of linear equations which has the same solution as the 
original system. This fact is the motivation of the definition of combinatorial 
equivalence. 

Definition 9.1. Let A and B be two m X n matrices. B is said to be 
combinatorial^ equivalent- to A if and only if B can be obtained from A 
by a finite sequence of row permutations, column permutations, and 
pivot operations on nonzero elements. 

As in § 6.2, P l} will denote the elementary matrix obtained by permuting 
rows i and j of J. Premultiplication of A by P XJ interchanges rows i and j f 
and postmultiplication by P tJ interchanges columns i and j. 

Theorem 9.2. Combinatorial equivalence is an equivalence relation on 
the set of all m X n matrices. 

proof: Clearly the relation is reflexive and transitive. By Exercise 2 (i), 
if a %i 0 then = A. Also P,jP l} A = A = AP tJ P tJ . Since each of 
the basic operations of combinatorial equivalence is self-inverse, the 
relation is symmetric. 
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In the next theorem we establish a matrix characterization of combina- 
torial equivalence which is more convenient to apply than the formal defini- 
tion and which leads directly to other characterizations. 

Theorem 9.3. B is combinatorially equivalent to A if and only if there 
exists an (m + n) X {m + n) permutation matrix P and an m X m 
nonsingular matrix Q, such that 

Q(I\A)P = (I\B). 

p r o o f : By a permutation matrix P we mean a product of elementary 
permutation matrices P tJ . Suppose that B can be derived from A by 
a finite sequence of row permutations, column permutations, and pivots. 
For an elementary permutation matrix P XJy where i, j < m ) we have 

(I\P tJ A) = P tJ (I\A)P tJ ; 

hence any row permutation of A can be effected by multiplying (I\A) 
on the left by a nonsingular matrix and on the right by a permutation 
matrix. Similarly, for i, j < n 

(I\AP l} ) = (/|4)/V*.m + ,, 

so a column permutation on A can be performed by multiplying (/j A) 
on the left by the nonsingular matrix 1 and on the right by a permutation 
matrix. We now show that the same is true for a pivot on any nonzero 
element. Beginning with the matrix (/(A), let i and j be fixed such that 
a tJ 0. Perform the following operations in succession: for each k i 
multiply row k by a tJ ; for each k ^ i add — a k] times row i to row k ; 
permute column i and column m + j; multiply each row by a^\ It can 
be verified that the result is (I\A*j). Since all of the row operations are 
elementary and since the only column operation is a permutation P t , m + Jf 
we have 

QoU\A)P t, m +j = (/|AJ). 

Hence if a finite succession of row permutations, column permutations, 
and pivots transforms A into B, then 

Q(1\A)P = (I\B) 

for some nonsingular matrix Q and some permutation matrix P. 

Conversely, suppose that the last equation holds. Write P in any way 
as a product of elementary permutation matrices, 

P = TJr--T k . 

We proceed by induction on k.Uk = 0, we have 
Q(/|A) = (Q|QX) = (I\B). 
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Hence Q = / and A = B, so B is combinatorially equivalent to A. 
Assume that if 

Q(I\A)P = (/to 

whenever P is a product of fewer than k elementary permutation matrices, 
then C is combinatorially equivalent to A. Then we have 

(• I\B ) = QU\A)P = Q(l\A)PT k , 
where P = T j T 2 ■ • -TV,. Hence 

(I\B)T k = Q(I\A)P. 

Now 7V transposes two of the m + n columns of the matrix on which 
it operates, and we need to distinguish three separate cases, according to 
whether T k transposes 

\ 

(1) two of the first m columns, or \ 

(2) two of the last n columns, or 

(3) one of the first m columns and one of the last n columns. 

(1) If T k transposes two of the first m columns of (I\B), then T k is of 
the form P 1Jy where i, j < m. We have 

/V/IW, = (/|/ , „B) = P tJ QU\A)P = QV\A)P, 
so by the induction hypothesis P I3 B is combinatorially equivalent to A ; 
since B is combinatorially equivalent to P XJ B, B is combinatorially 
equivalent to A. 

(2) If T k transposes two of the last n columns of ( I\B ), then T k is of 
the form P m + l(Tn + ; , where i,j < n. We then have 

(/|/i)/^ + ,. m+ , = (/| BP„) = Q(I\A)P. 

As in (1) BP,j is combinatorially equivalent to A and to B. 

(3) If 1\ transposes one of the first m columns and one of the last 

n columns, then T k is of the form where i < m and j < n. We 

have 

(/ \B)P i,m+J QU\A)P. 

From the first part of the proof we know that if b tJ ^ 0 then row opera- 
tions together with P l , m + } perform a pivot on b,j. Hence 

am = QM\B)P,. m +, = QiQ(I\A)P. 

As before, B* 3 is combinatorially etjuivalent to A and to B. 

In fact, however, it can happen that h l} = 0, in which case the pivot 
described above cannot be performed. T > complete the proof we use an 
argument suggested by P. J. L. Grant which makes a more careful 
analysis of the permutation P. P is a permutation on M U N, where 
M — {1, . . . , m) and N = {m + 1, . . . , m + n}, which maps as many 
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elements of M into N as it maps elements of N into M. Let 
{ii, ... 9 t t } a M and {m + j h . . . , m + j t } c: N be the two sets of 
elements which are interchanged between M and N by P . Then P can 
be represented as 

P « P1P2P3, 

where P t is a product of transpositions of type (i), i = 1, 2, 3, and where 

P3 = P ti.m+jiP tj.m-h/i * ’ ‘ Pii,m+jr 

The subscripts of this representation of P 3 are all distinct, so these 
transpositions commute. For simplicity we denote the last of these by 
P The equation 

(I\B) = Q(I\A)P = Q(IPy\AP 2 )P* « (QPi\QAP 2 )Pz 

shows that t columns of Q are simply a rearrangement of the^'i, . . . ,j t 
columns of B; the other columns of Q are certain columns of /. This 
implies that if = 0 for s = 1 , 2, . . . , t, then the corresponding 
column of Q is a linear combination of the columns of Q which are 
columns of /, contradicting the nonsingularity of Q. Hence for some 
i„ bi,j ^ 0. Now we commute P^m+j. to the next to the last position 
in P 3 and write i. = h and j, = k to obtain 

p = 

where P is the product of Pi, Pi, and t — 2 transpositions of P>. We have 
(7|P*,B) = P„,(/|P)P*, = PnQ(I\A)PP k , m+l P„ m+i P h ,. 

Now C = PmB is combinatorially equivalent to B and c i; = ^ 0. 

Hence if we multiply on the left by a suitable Qy and on the right by 
Pi,m+j, we pivot on c*> to obtain 

(7|CW = Q 1 P k ,Q(/|A)PP fc . m+l P„ m+J P )l ,P,. m+ , 

= Q(I\A)PP t n+j,m+kPh,m+k ) 

where CJ is combinatorially equivalent to C, hence to B. Now P m+j,m+k 
is a type (2) transposition which commutes with each type (3) trans- 
position appearing in P, since neither subscript occurs on a type (3) 
permutation of P. In effect, these operations have replaced 

(I\B) = Q(I\A)P 
by (/|5) = Q(I\A)P, 

where j 8 is combinatorially equivalent to B. P and P can be represented 
by the same number of transpositions, but P can be represented by one 
fewer transpositions of type (3) than can P. Repeating this argument, 
and using the earlier results concerning transpositions of type (1) and (2), 
we obtain 
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(I\D) = R(I\A)P„ >m+qi 

where D is combinatorially equivalent to B. Then d iq = r ip for all i, 
so d pq = r pp . Except in column p, R coincides with /, so r pp j* 0. Hence 
we pivot on d pq to finish the proof. 

It is helpful to compare Theorem 9.3 with Theorem 6.12. 

Theorem 9.4. B is combinatorially equivalent to A if and only if some 
permutation of the columns of (l\A) produces the matrix (G\H), where 
G is m X m and nonsingular and GB = H. 

proof: Suppose Q{I\A)P = (, I\B ), and let (I\A)P = (G\H). Then 
(QG\QH) = (I\B), so / = QG and B = QH = G -1 //. Conversely, 'if 
G, 77 exist as stated, then (/ \A)P = (G| //) = (G[GB) for some permuta- 
tion matrix P. Then G~ y {I\A)P = (I\B). 

There are (m + n)l possible permutations of the columns of (T\A) } not 
all of which necessarily produce a nonsingular G in the first m columns. But 
for each P, (I\A)P is row equivalent to a uniquely determined matrix in 
reduced echelon form. Thus Q is uniquely determined by A and P such that 
Q(/|A)P is in reduced echelon form. This may or may not have the form 
( I\B ) for some B. 

Theorem 9.5. There are at most (m + n ) ! matrices which are com- 
binatorially equivalent to A ; this maximal number is achieved if and 
only if every set of m columns of (I\A) is linearly independent. 

proof : Our previous remarks show that each of the (m + ri)! permu- 
tations of the columns of (/|*4) determines at most one matrix which is 
combinatorially equivalent to A. If every set of m columns of (I\A) is 
linearly independent, then each permutation determines a nonsingular 
matrix G as in Theorem 9.4. Then A is combinatorially equivalent to 
G~ l H . If there exists a set of m linearly dependent columns of (I\A), 
let P be the permutation which places these columns in the first m col- 
umns of (/l^)/ 5 . Since linear dependence of columns is preserved by 
row operations, the reduced echelon form of (I\A)P has linearly de- 
pendent columns in the first m positions and hence is not of the form 

(/|B). 

Theorem 9.6. B is combinatorially equivalent to A if and only if the 
systems of linear equations 


Y + AX = Z 

V + BU = Z 
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are equivalent, where the set of variables {v it ... ,v m ,u\, ... } u»} is 
some permutation of the variables {y X) . . . , y n , x x , . . . , x n } . 

proof : If B is combinatorially equivalent to A, then (I\A)P is row 
equivalent to ( I\B ). Hence the systems 

mi'Q.z 

and 

m Q = z 

are equivalent. Conversely, if these systems are equivalent for some 
permutation P of the variables, 



their matrices are row equivalent, so Q(I\A)P = (/|/?). 

Theorem 9.7. If A is nonsingular, A and A~ l are combinatorially 
equivalent. 

proof : Y + AX = Z and X + A~ l Y = Z are equivalent systems. 

This implies, of course, that A~ l can be computed by a sequence of pivot 
operations. Since the number of such operations need not exceed the dimen- 
sion n of A, and since each pivot can be performed by n 2 — n multiplications 
and n divisions, A~~ l can be computed by n 3 or fewer multiplications and 
divisions. 

Theorem 9.8. B is combinatorially equivalent to A if and only if 
— B' is combinatorially equivalent to —A'. 

p r o o f : As an exercise you may verify that 
-(AW = (-A%. 

Thus any pivot operation on A produces a matrix whose negative trans- 
pose is combinatorially equivalent to —A'. The remainder of the proof 
follows easily from this fact. 

Finally we remark that the statement of Theorem 9.8 would not be valid if 
either the negative signs or the transpose signs were omitted. This essential 
combination of negative and transpose expresses the form of duality which 
we observed for linear programs. 



236 Combinatorial Equivalence 


[ch. 9] 


Exercises 

1. Assume that combinatorial equivalence had been defined by means 
of the equation Q(I\A)P = (I\B) as described in Theorem 9.3. Prove that 
the resulting relation is an equivalence relation. 

2. Given a matrix A for which a,,-, a,t, and a*, are nonzero, prove the 
following statements. 

(i) (AM = A. 

(ii) (A*j)Ij ~ B,icA*j, if k ^ i. 

(iii) (A *;)** = A*icP ,k, if k ^ j. 

(iv) = P ,lA, if k ^ i. 

(v) ((A*j) tk)i] = APk„ \i k ^ j. 

3. (i) Write out all matrices which are combinatorially equivalent \to 



By Theorem 9.5 there are at most 24 (actually 20) such matrices, which can 
be computed by four pivot operations together with row and column per- 
mutations. 

(ii) Verify that A~ l is in the list. 

4. Calculate the inverse of each matrix of Exercise 1, § 6.3, by means of 
pivot operations. 

5. Verify in detail the statements in the proof of Theorem 9.8. 

6. Describe explicitly the uniquely determined matrix R such that if 
a,j ¥■ 0 then (I\A*j) = R(I\A)P t , m+) . 
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§ 10 . 1 . Sequences and Series of Matrices 

It is assumed that you are already familiar with the basic facts about 
infinite sequences and series of complex numbers, or at least about real 
power series. In particular we shall refer to the following: 

The definitions of convergence of infinite sequences and series of numbers. 

The circle of convergence of a complex power series (interval of convergence 
in the real case). 

Taylor series expansions for such functions as 


e x = Y “V valid for all complex x, 

n-0»! 


valid for all complex x, 


- x tn+l 

cos x = Y (— 1)" rn vy valid for all complex x, 

n-0 (^W): 


* x n 

log (1 + i) = Y (— 1)" +1 — ’ valid for all complex x such that |z| < 1, 

n — 1 W 


1 


1 — X 


£ i", valid for all complex x such that |i| < 1. 

n-0 


Definition 10.1. A sequence {A w } = 4 (1) , A w .4 (n) , ... of m X n 

matrices is a function whose domain is the natural numbers and whose 
range is a set of m X n matrices. Let A w = (ajf)- The sequence {A (i) } 
is said to converge to the matrix A = (a,-,) if and only if for every 
i = 1,2, ... ,m and every j — 1,2, ... ,n, the number sequence {<$'} 
converges to a ti . 
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Example 


Let 



Then {A (k) } converges to /, since {a{f } converges to for i,j 


1 , 2 . 


Theorem 10.1. If {. A (A) } converges to A, if P is a fixed h X m matrix, 
and if Q is a fixed n X t matrix, then {PA {k> Q\ converges to PAQ. 

proof: Let B ik) = PA {k) Q. For fixed i, j , 

bir -ee ; 

r=l s*l \ 

By hypothesis, {o{?} converges to a r „ for every r - 1,2, ... ,m anil 
every s = 1,2, ... ,n. Hence {p^a^q,,} converges to p ir a„q aj , and 

m n m n 

E E {p.ra'r^q,,} converges to E E p, r a TS q kJ) 

r *» 1 *=- 1 r«l i-l 

since a linear combination of convergent number sequences converges to 
the same linear combination of their limits. Hence {PA {k) Q) converges 
to PAQ. 


The significance of this theorem is that convergence is preserved under the 
various equivalence relations that we have considered for matrices, and 
particularly by similarity. Thus it is possible to define convergence of a 
sequence of linear transformations by means of convergence of the matrices 
that represent those linear transformations in any coordinate system. 

Definition 10.2. An infinite scries 

£ Aw = AW + Aw + • • - + A <*> + - • - 
*-o 

of m X u matrices is said to converge to the matrix A if and only if for 
every i = 1, 2, . . . , m and every j = 1, 2, . . . , n, the series o|f ) 

converges to a tJ . 

Thus convergence of a series of matrices is defined by the convergence of 
the mn number series of the elements in the same position. We shall consider 
only power series of matrices, that is, series for which 

Aw = a k X k , 

where a* is a scalar and X is a square matrix. (Previously we used X only 
to denote row vectors.) 
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Definition 10.3. Given a scalar power series, £*-o let / be the 
function defined by 

/Or) = £ a k x k 
£-0 

for all x for which the scries converges. The matrix-valued function f of 
the square matrix X is defined by 

' f(X) - £ o k X\ X* = / 

£«0 

for all matrices X for which the series converges. 


We recognize, of course, that the Hamilton-Cayley theorem makes it pos- 
sible to avoid calculating powers of X higher than n — 1, since X n is a linear 
combination of lower powers of X. If all such higher powers are thus con- 
verted, the infinite series 

f(X) = £ a k X k 
£-0 

is changed into the form 

/(X) = £ l s*X*, 

£~0 


where each s k is an infinite series of scalars. The convergence of /(X) is then 
a question of the convergence of each of the $ k . (In this connection see Exer- 
cise 4.) 

It is clear that Definition 10.3 is an extension of the correspondence between 
scalar polynomials and matrix polynomials with scalar coefficients. When 
we speak of the matrix functions e x i (7 — X) -1 , cos X , we mean 


OO Vk 

(> x — y —} 

*to fc! 


(/ - *)-* = E *\ 
£» 0 


ao y 2 * 

cos X = £ ( — 1)*7^77 
£-o (Ik)'. 

Two questions arise immediately : 

For what matrices X does a power series converge? 

If a series converges for some X, to what matrix does it converge? 

Our method of answering these questions will be to reduce X to Jordan 
form J with diagonal blocks of a simple form, to answer the questions for J, 
and then to extract corresponding answers for X. The following three the- 
orems, which may be proved as exercises, pertain to this method. In these 
theorems / refers to a function defined by a power series, £?. 0 a*X\ 
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Theorem 10.2. If f{X) converges and if Y = PXP~ l , then /(F) con- 
verges to Pf(X)P~ l . 

proof: Exercise. Apply Theorem 10.1 and the definition of con- 
vergence of a series of matrices. 


Theorem 10.3. If X is a diagonal block matrix, 


then 




X\ and X, square matrices, 


(a) f(X) converges if and only if f(X i) and f(X,) converge, and 

(b) if J(X) converges, then 


/m = f f 


,) 


z 

fix,); \ 

proof : Exercise. Observe that this theorem can be generalized b# 
induction to the case in which there are any finite number of diagonal 
blocks. 


Theorem 10.4. If X is nilpotent, f(X) converges. 
proof: Exercise. 


Theorems 10.2 and 10.3 reduce the questions of convergence of a matrix 
series to questions about the convergence of the series for a matrix in the form 

r Xi 1 
Ai 

Ji = ' 

1 

Ai, 

since any X is similar to a diagonal block matrix where each block is of this 
form. Thus our attention is focused on the convergence of the series 

£ a*(XJ + N) k , 

*- o 

where A is a characteristic value of X and N is nilpotent. 

Theorem 10.5. If 2Z?-o dkX k converges for all x such that |x| < r, 
and if |X| < r for every characteristic value A of X , then £*-o atX k 
converges. 

proof: Consider the diagonal block J i of the Jordan matrix similar 
to X such that A appears in every diagonal position and 1 in every super- 
diagonal position. Let 

S m (Ji) - £ a k (\I + N)K 

*-o 
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Since N is nilpotent of some index p, there are no more than p terms in 
the expansion of (X/ + N) k even for large fc. Let m> p. Then 

S m (J i) = a*I 

+ a{kl + a\N 

+ aiX 2 / + 2 Oi\N + OiN 2 


+ a m _iX m ~ 1 / + Cr-'a^-iX^-W + • • • + 
+ a m X m 7 + C?a m \”~'N + • - - + 


where C? = 


ml 

sl(m — s)! 


is 


the binomial coefficient. 


Summing on like powers of N , we have 

S.(Jx) = P E ( f a k C k X*-' W. 

r- 0 U-r / 


But 


so 


«*»-- 3(T^V w -7i£<** , L 

£ <*•)]_ 


where S^(X) is the rth derivative of S m (z), evaluated at x « X. There- 
fore, S m (Ji) has the upper triangular form, 

ww '"F^i s "' 1,(x) 
iS " (X) "'(^2)‘!' S "" 2>(X) 


£m(X) 

By hypothesis, S"-o a*X fc converges, since |X| < r. Thus the sequence 
{£?-<> a*X*} = (S m (X)} converges. But from the theory of infinite series 
it is known that the series obtained by differentiating H?-o a k x k term 
by term will converge for \x\ < r. Hence the sequence {S m (J i)} con- 
verges, which means that the series converges, so the proof 

is complete. 




SMi) - 


/ 


sjSi SH\\) 
S m (X) 


\ 


This theorem shows that such functions as e x , sin X , and cos X exist for 
all X , since the corresponding scalar series converge for all x . However, we 
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cannot assume that the matrix function possesses all the properties of the 
corresponding scalar function; for example, 

e x e r ^ € x+y^ 

although equality does hold if X and Y commute. 

Another important consequence of Theorem 10.5 is the special role played 
by the characteristic value of largest absolute value. If / is defined by an 
infinite series whose radius of convergence is r, and if X 0 is the characteristic 
value of largest magnitude of a matrix A r , then/LY) converges if |X 0 | < r and 
diverges if |X 0 | > r. If |X 0 | = r, /( X) may or may not converge. A simple 
relation between the characteristic values of X and those of/(X) is given in 
the next theorem. 

Theorem 10.6. If /( X) converges and if X is a characteristic value of 
X , then /(X) is a characteristic value of /(A r ). 

proof : If ,/ = I'XP" 1 is the Jordan form of X , then X and J have 
the same characteristic values, and /(./) = Pf(X)P~ l . But the proof 
of Theorem 10.5 shows that f(J) is an upper triangular matrix with 
/(X0,/(X 2 ), . . . ,/(X„) as the diagonal elements. Hence these are the 
characteristic values oi f(J) and of the similar matrix f{X). 

Theorem 10.7. For every matrix X, det e x = e irX and therefore e x is 
nonsingular. Furthermore, ( e x )~ l = e~ x . 

proof : Exercise. 

Exercises 

1. Prove each of the following theorems: 

(i) Theorem 10.2, 

(ii) Theorem 10.3, 

(iii) Theorem 10.4, 

(iv) Theorem 10.7. 

2. Evaluate e A , given 
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0 1 1 

(iv) A = 0 0 1 

0 0 0 

3. Prove that a series A {k) of matrices converges if and only if the 
sequence {£ (n) } of matrices converges, where S (n) = A (0) + *i (1) + • • • + 

4. Let f(x) = dkX k } let A be a matrix with distinct characteristic 

values Xj, X 2 , . . . , X n for which f(A) converges. It can be proved that 

f(A) = D~ l [Dol + D\A + • • • + L) n -\A H ~ l ], 
where D is the Vandermonde determinant 

1 L . . 1 

X, X 2 . . X fl 

XT X2 . . X* 

D = 

|xr’ xr'...x:i 

and Dk is the determinant that coincides with I) except that the (k + 1) 
row vector is (/(Xi),/(X 2 ), . . . ,/(X„)). (This formula is reminiscent of Cramer’s 
rule.) 

(i) Apply the method described above to calculate f(A ), where 

/ 3 2 2 \ 

A = ( 1 4 1 ). 

\_ 2 -4 -1/ 

(ii) Check your result in (i) by using it to compute A 2 . 

(iii) For A as given in (i) find necessary and sufficient conditions on / 
that f(A) be a scalar matrix. 

(iv) Calculate e A . 


§ 10 . 2 . Matrices of Functions 

We now consider an m X n matrix whose entries are not scalars, but scalar- 
valued functions. To be specific, let 

A"(0 = (a-, ,(<)), * = 1,2 ,m;j = 1 , 2 , . . . , n, 
where is a real- valued function defined for all £ in an interval a < t < b. 
Now X is a function whose domain includes a < t < b, and whose value at t 
is an m X n matrix of real numbers. In this section we shall indicate how a 
calculus of matrices of functions can be defined. 

The idea is extremely simple, since continuity, derivative, and integral are 
defined component by component. 



244 Functions of Matrices [ c h . 10] 

Definition 10.4. Let X be an m X n matrix of real- valued functions Xu 

which are defined for all£ in some set of real numbers. 

(a) X is continuous at to if and only if x XJ is continuous at to for 
i « 1 , 2, . . . f m;j = 1 , 2, . . . , n. 

(b) The derivative S)X of X is defined by 

»*■(«.) = (| *./*«)), 

if and only if — x tJ (to) exists for i = 1, 2, . . . , m;j = 1 , 2, . . . , n. 

(c) The definite integral J b X is defined by 

J a x = (/ 0 \ 

fb \ 
if and only if J x tJ (t)dt exists for i = 1 , 2, . . . , ra; j = 1 , 2, , n. 

We shall develop only enough calculus of matrices to enable us to apply 
our results to the solution of a differential equation. 

Theorem 10.8. If X is n X n, if F is n X p, and if DX and DF exist, 

then 

(a) S>(XY) = (2DX)F + X(J>F), 

(b) SD(X P ) = p a positive integer, 

(c) 2D(A - 1 ) = ^ is nonsingular. 

proof: Exercise. 

Observe that these rules are generalizations of the corresponding results 
for scalar functions, which reduce to the usual differentiation formulas if all 
of the matrices involved commute with each other. In general, however, 
even X and £)X do not commute, so we must be careful to preserve the order 
of the matrices in these formulas. 


Theorem 10.9. If A is an n X n matrix of scalars, then 
®(e At ) = Ae At = e At A . 
p r o o f : By definition, we have 


_ y , 4* + dl 2 , 

" £o *! “ + 1 + 2! + 


»(«*•) = L 

*- o 




f A(At) k ~' _ - A(AtY 
*-i (k - 1)! j-o jl 


AeA 1 «= eA l A. 
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This result is reminiscent of the corresponding property of the exponential 
function 


which plays such an important role in the solution of certain differential equa- 
tions. In the next theorem we consider the analogous problem for matrices. 


Theorem 10.10. Consider the matrix differential equation 

£>A(0 = X(t)A, 

where X(t) is an m X n matrix of differentiable functions and A is an 
n X n matrix of scalars. Then 

(a) for any fixed value U of t , the matrix 

X(t) = X 

is a solution of the differential equation, 

(b) any solution is of the form specified in (a) for some value of 

(c) the rank of a given solution A r (/) is the same for all values of /. 

proof: Conclusion (a) follows directly from Theorem 10.9. To 
prove (b) let X(t) be any solution, and let 

Y(t) = A 

Then 

T>Y(t) = [D.Y (t)]c~ At + X(t)Ve- At 
= [:0X(t)]c~ At - X(t)Ac~ At 
= [axY (0 - X(t)A]<r A * 

= Z, 


since £>A\0 = X(t)A by hypothesis. Then } r is a constant m X n 
matrix C. For any value t {) of t, 

Y( t[i ) = C = X(t„)e- A \ 

so 

A ft) = Cc At = X(tn)(c- A '°)c At 
= X (/ 0 )c ,| - fr,A l 

where the last equality is valid since At ( , and At commute. Furthermore, 
by Theorem 10.7, is nonsingular, so the rank of X it) is the same 

as the rank of A(/«), which proves tc). 


In the next section we shall apply Theorem 10.10 in the special case in 
which A is a row vector. 
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Exercises 

1. Prove that £>X as given in Definition 10.4 satisfies 

1>X (to) = lim l [X(k + h) - X(<o)] 
h—*0 n 

if limit is understood to operate on each component. 

2. Prove Theorem 10.8. 

3. Does an analogue of the fundamental theorem of calculus hold for 
matrix calculus? Explain. 

4. Does an analogue of the mean value theorem of the derivative hold for 

matrix calculus? Explain. 1 

5. Derive an expression for the derivative of det X(t). 

6. Derive an expression for the derivative of trX(0, and prove tha' 

|[trZ«)] = tr[3DX«)]. 

§10.3. An Application to Differential Equations 

We now consider the problem of determining all solutions of a linear, 
homogeneous, nth order differential equation with constant coefficients, 

{10.1) x + ai x + • • • + a„_i — x + a n x = 0. 

Most of the essential facts concerning the solutions of this equation can be 
verified easily by direct calculations. Our first observations are that any 
scalar multiple of a solution is also a solution, and that the sum of any two 
solutions is a solution. Therefore, the set of all solutions of (10.1) forms a 
vector space, called the solution space. Furthermore, the dimension of the 
solution space is n, a fact which follows from a theorem concerning the 
uniqueness of solutions of (10.1). Therefore, the problem of finding all solu- 
tions of (10.1) reduces to the problem of finding a basis for the solution space, 
a set of n linearly independent solutions. 

If we let x = e rt y then x is a solution if and only if r satisfies the poly- 
nomial equation 

(10.2) r n + air”- 1 + • • • + a„_ir + a n = 0. 

At this point we separate the discussion into two cases. First, if the roots 
r h r 2 , . . . , r„ of (10.2) are distinct, then the solution set {e rit , e T2t } . . . , e Tnt } 
is linearly independent, so any solution has the form 

n 

x = £ cue Tit , 

1-1 
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where the c, are suitable scalars. As the alternative to the case in which 
{10.2) has n distinct roots, suppose that the distinct roots are r h r 2 , . . . , r*, 
and that r t is a root of multiplicity m iy for i = 1, 2, . . . , k. Then for each i 
each of the functions 

{e r,t , te rit , . . . , 


is a solution, and this set of solutions is linearly independent. The collection 
of all solutions obtained in this way by letting i vary from 1 to k forms a basis 
for the solution space. Thus we have full information concerning the solu- 
tions of {10.1). 

Although verification of the preceding statements is not difficult, this 
approach is not completely satisfying because the success of the method 
seems to depend upon a mystic process of guessing that functions such as 
t 2 e nt are solutions. In this section we shall restate and resolve the problem 
in matrix notation, observing that the solutions arise in a natural way from 
the Jordan form of a matrix determined by {10.1). 

First we make a change of variables in order to replace the single aqua- 
tion {10.1) of order n by a set of n equations of order 1. Let yi = x, 



?/„ = — y n -\. Then we obtain the n equations 


dt y ' = 2/2 


dt 


2/2 = 2/s 


(tO. 3) 


Let 

and let 


d 

dt Vn ~ l ~ Vn 
d 

[*dt — tonlfl Osn—lU'i * * ' Vn 


Y = ( 2 / 1 , 2 / 2 , • • ■ , Vn), 


A = 


0 0 0 ... 0 — a„ 

1 0 0...0 — a n -i 

0 1 0 ... 0 — a „_2 




-ai 


Then {10.8) or, equivalently, {10.1) is written simply DF = YA, and by 
Theorem 10.10 any solution is of the form 

Y{t) = Y{to)e«'f )A . 
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Now for some nonsingular matrix P, J = P~ l AP, where J is a Jordan matrix. 
Let W = YP\ then 

W(t) = Y(t)P = [Y(k)P]P~ l e^ A P 
= W(t ty 

= ...,An) + An # 

The characteristic values X t of 4 are the solutions of (10.2) because the 
characteristic equation of A is 

X n + fliX*” 1 + • • • + d n ~ iX + a n = 0. 

(See Exercise 7, §7.3, and apply the result to A’.) We now consider the 
form of W(t) for two special cases. * 

Case /. Xi, X*, . . . , X n all distinct. Then N — Z, and we have \ 

W(i) = W(* 0 )c (< -' oMiag(Xl x -> 

= TF(Jo)diag(r Xl(< “ ,o) , . . . , e x * ( *~ ,a) ) 

= (i^i(^o)<° x,u “ fo) , . . . , Wnit^c^'-M), 

where the kth component of the vector H 7 (^) is denoted w k (to). Thus any 
solution vector is a linear combination of the vectors determined by the 
distinct characteristic values, 

(0, . . . , 0, e x *«-'o\ 0, . . . , 0). 

Case II. \ i = X 2 = • • • = X„. Then N has zeros and ones on the super- 
diagonal, and 

W(t) = W(i 0 )e»'^ x/ +^ 

= W (t 0 ) [c x ( « - <° } ^] [f> ( « “ <o) AT] 

= [/+(<- u,)n + ■ • • + (t ( ~y^! 

Thus an arbitrary solution vector has for its kth component an expression of 
the form 

PkiQe^-M, 

where p k (t) is a polynomial in t of degree less than k. Those polynomials are 
generated by the various powers of the nonvanishing nilpotent part of the 
Jordan form of the matrix A. Thus a basis for the solution space can be 
chosen to be the vectors 

{e Xt , te u , . . , £"~V Xt !-. 

The general case in which there are r distinct characteristic values \ iy each 
of multiplicity m,, is a combination of the two cases described, any solution 
vector being a linear combination of the vectors 

Pj(t)e Xft y 
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where p/f) is a polynomial of degree less than m n j = 1,2 , ,r. Hence a 
basis for the solution space is 

jgXjf ^>X it ^ |Ml — lgX|f ^)Xsl |gXll — 1 ^jXif ^>Xrf ^Xrl |Wr“ l^iXdj 

Exercises 


1. Work through the method of the text to solve the differential equation 
jP ^ l _L _« 

All x ,m J At x ,r 


showing that any solution vector can be written in the form 

(ae 1 , (at -f b)e', t r~'). 


Observe the manner in which the second component arises from the nilpotent 
part of the Jordan form of the matrix determined by the equation. 
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The objective of this appendix is to formulate fundamental algebraic con- 
cepts in terms of the notion of sets, thus supplementing the discussion of 
abstract systems in Chapter 1. The treatment is brief and somewhat formal, 
but exercises are provided to augment the reader’s understanding of this 
material. 


§A.l. Cartesian Product of Sets 

In § 1.3 we defined the cartesian product S X S of a set S with itself. It is 
quite clear that this definition can be generalized in two ways: first by con- 
sidering the cartesian product S X T of any two sets, and then by extending 
the number of sets involved from two to n. 

Definition A.l. Let Si, St, ... ,S n be any sets. The cartesian product 

(Si X <Sj X • • • X S n is the set of all ordered n-tuples, 

(Si X St X • • • X Sn = {(Si, 8t, . . . , Sn) | s, e (S. for i = 1, . . . , n}. 

Two n-tuples are equal 

(*1, St, ••• , Sn) — (ll, it, , tn ) 

if and only if «i = t, for every i = 1, 2, .... n. 

Thus for each i, the ith component of any element of Si X St X X S n 
is an element of Si, and SiXStX • • • X S n consists of all n-tuples which 
can be formed in this manner. 
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Exercises 

1. If S x is a set of m, elements, for i — 1, 2, . . . , n, how many elements 

are in the set X S 2 X ■ • ■ X £„? 

2. Let I denote the set of all integers, and let It denote the set of all real 

numbers. Describe by means of a graph each of the following sets: 

(i) I X R y 

(ii) R X L 

3. Let S and T be arbitrary sets. 

(i) Prove ‘that 8 X T and T X S have an element in common if and 

only if 8 and T have an element in common. ^ 

(ii) If S and T have m elements in common, how many elements d\) 

& X T and T X S have in common? ' 

(iii) If T C S, show that TXT = (T X S) D (S X T). 

4. For each i = 1, 2, . . . , n let S, = be an abstract system 

having one binary operation whieh is closed on K k . An operation * is 
defined on li\ X EiX • • • X En by the rule 

(fll, ®2j • • • J 0>n)*(b 1, b‘Z } ■ • • J bn) = (Cl, C2, • • • ) Cn)> 

where c x = a x o t b t for each i. Prove the following. 

(i) * is closed on Ei X E 2 X • • • X E n . 

(ii) * is associative if and only if each Ox is associative. 

(iii) If each E t has an identity element c\ relative to o lf then £\ X 
E 2 X • • ■ X E n has an identity element relative to *. 


§A.2. Binary Relations 

Definition A. 2. A binary relation R from a set A into a set B is a sub- 
set of A X B. If (a, b) e R, we say that a is related to h } and write 

a R b. 

if 

The domain of R is the set of all elements of A which are related by R 
to at least one element of B ; 

dom R = {a G A | a R y for some y € B) . 

The range of R is the set of all elements of B to which at least one element 
of A is related by R ; 

range R = {b e B | x R b for some i6i}. 

A binary relation from A into A is called a relation in A. 
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It is sometimes convenient to think of a relation in geometric terms as an 
association , or many- valued correspondence, from A into B. Each clement 
in dom R is associated by R with one or mofe elements in range R. For 
example, let A = {a h a 2 , a 3 } , B = {6 1? 6 2 , 63, 64} ; let R be the subset of A X B, 
which consists of the pairs (<*1, 60 , (a lf 6 2 ), (a 2 , 60 , (< h , 64), and ( a h 64). Then 
dom R = A and range R = {61, 6 2 , 64} c B. R can be represented geomet- 
rically by the following diagram. 


Exercises 

1. How many different binary relations can be defined from a set of m 
elements to a set of n elements? 

2. Describe the number pairs that comprise each of the following relations. 

(i) The order relation < for real numbers. 

(ii) The relation “a divides 6 evenly’’ for positive integers. 

3. A binary relation R on a set A is called a partial ordering of A whenever 
R is 

reflexive : a R a for every a e A , 

anti-symmetric : if a R 6 and 6 R a, then a = 6, 

transitive: if a R 6 and 6 R c, then a R c. 

(i) Show that the subset relation is a partial ordering of the collection 
K of all subsets of a set S. 

(ii) Show that “ a divides 6” defines a partial ordering of the positive 
integers. 

(iii) Show that “ f(x ) < g(x) for all x e [0, 1]” defines a partial 
ordering of the set of all real valued functions whose domain of definition 
includes the interval [0, 1]. 



§a .3. Functions 

Definition A.3. A function F from a set A to a set B is a binary relation 
from A into B that satisfies the additional properties 
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(a) domF^ 

(b) if a F bi and a F 6 2 , then bi = b 2 . 

The domain and range of F are its domain and range as defined for 
relations. 

The essential condition that distinguishes a function from an arbitrary 
relation is the requirement that each a e dom F be associated with one and 
only one b E range F. Because b is uniquely determined by a and F, the 
relation notation a F b can be replaced for functions by the notation described 
in § 1.4, 

b = a¥. 

Thus 

dom F = {a E A | y = aF for some y e B}, 
range F = {b e B | b = xF for some x e A). 

If range F = B, we say that F is a function from A onto B. ' 

Described geometrically, each a e dom F is associated with one and only 
one image aF e range F. To distinguish the unique association of a function 
from the multiple association of an arbitrary relation, we say that a function 
is a mapping. Each image aF of the mapping F is uniquely determined by 
its antecedent a. However, it is possible that different antecedents, a ^ x, 
determine the same image, aF = xF. In general, mappings are many-to-one , 
which means that many distinct points of the domain are mapped into the 
same image point of the range. 

Definition A.4. A mapping F from A to B is said to be one-to-one (or 
reversible) if and only if for all a, x e dom F whenever aF = xF, then 
a = x. The inverse F* of a one-to-one mapping is the mapping from B 
to A defined by 

dom F* = range F, 
range F* = dom F, 

2/F* = x, 

where x is the uniquely determined element of dom F such that xF = y . 
F* is frequently denoted by F“* 

Exercises 

1. By recalling the definition of a function in terms of sets, state what it 
means for two functions to be equal. 

2. How many different functions can be defined from a set A having m 
elements to a set B having n elements? (Recall that the domain of such a 
function can be any nonvoid subset of A.) 
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3. Consider the function F of two real variables defined by 

F(x, y) = xVl - y sin(z + y ). 

Show how this function is described by Definition A.3. Specify the domain 
and range of F. 

4. Let R be the set of all real numbers. Give a specific example of a func- 
tion F from R to R for each of the following conditions. 

(i) dom F a R, range F a R. 

(ii) dom F = R f range F a R f F one-to-one. 

(iii) dom F a R, range F = R, F not one-to-one. 

(iv) dom F = R, range F = R, F not one-to-one. 

(v) dom F = R, range F = ft, F one-to-one. 

5. Let F be a one-to-one mapping with domain A and range B. Describe 
each of the mappings FF* and F*F. 

6. Prove that the inverse of a one-to-one mapping F is uniquely deter- 
mined. 


§A.4. Binary Operations 

Definition A.5. A binary operation on a set A is a function from A X A 
into A. An operation on A is said to be closed if and only if the domain 
of the operation is the full set A X A. More generally, an n-ary operation 
on A is a function whose domain is the set A X A X • • • X A (n times) 
of ordered n-tuples of elements of A and whose range is a subset of A. 

It is customary to use special symbols, rather than function notation, for 
binary operations. Thus <zi * a** denotes the image in A of the ordered pair 
(ai, o 2 ) e A X A under the mapping that defines the operation *. 

Exercises 

1. Explain how a closed binary operation on a set S can be considered as 
a subset of S X S X S } having special properties. List all properties needed 
to make your description precise. 

2. How many closed binary operations can be defined on a set having n 
elements? 

3. Let R be the set of real numbers, described geometrically by a coordi- 
nate axis. State what is meant by a graph of each of the following, listing any 
special restrictions you need to make your description accurate. 
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(i) A binary relation on R. 

(ii) A function from R to R. 

(iii) A closed binary operation on R . 


§A.5. Summary of Abstract Systems 

In § 1.5 an abstract system 

S = {E;R;0} 

was described as a set E of elements, a set R of binary relations in E , and a 
set 0 of closed operations on E ) together with a set of postulates which endpw 
the elements, relations, and operations with their distinctive properties. 
Each binary relation is a subset of E X E ; each n-ary operation is a function 
from E X E X • * • X E (n times) into E. Since a function is a type of rela- 
tion, each n-ary operation can be described as a subset oi E X E X • • • X E 
(n + 1 times). In this way the components of an abstract system are rep- 
resented as subsets of sets formed from the set of elements of the system. 


§A.6. Boolean Algebras 

As an illustration of an abstract system, we return to the specific example 
of the collection K of all subsets of a given set S , considered in § 1.2. This 
system can be denoted 

s = {K; =, C; u, n,'}. 

The two binary relations, two binary operations, and one unary operation 
were described in our introductory discussion of sets in terms of the un- 
defined concepts of “set” and “membership”; in effect, therefore, our only 
postulate for the system S was that we understood these two notions. There- 
fore our approach was intuitive and concrete rather than formal and abstract. 

We now undertake to develop an abstract system which will serve as a 
model for the algebra of sets. We consider a system (ft composed of any set 
E of elements, two binary operations on E, and one unary operation on E: 

<B = {E; V, A,*}. 

To make sure that the abstract system (ft contains the concrete system S as 
a specific example, we shall assume as postulates for (ft statements which are 
valid in S when we interpret the three operations of (ft to mean union, inter- 
section, and complementation of sets. In selecting such postulates, various 
choices are possible; what we seek is a simple set of statements about the 
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operations that will produce an accurate model of the algebra of sets. The 
following axioms were first given by E. V. Huntingdon in 1904. 

Bl. The operations V and A are commutative: for all x, y e E y 

x V y = y V x, 
x A y = y A x. 

B2. Each of the operations V and A is distributive over the other: for 
all Wj Xj y s E 

u> V (x A y) = (w V x) A (w V y ), 

w A (x V y) {w A x) V (te A y). 

B3. There exist in E distinct elements z and u which are identity elements 
for V and A respectively; for all x e E, 

x V 2 = x, 

x A u = x. 

B4. For all x g E the operation * satisfies 

x V x* = u, 
x A x* = z. 

Definition A.6. Any system (B = {E; V, A, *} satisfying axioms B1-B4 
is called a Boolean algebra. 

In terms of our intuitive concept of membership and our understanding of 
the meaning of the words used to define set union, intersection, and com- 
plementation, we observe that the system S of all subsets of a given nonvoid 
set S is indeed a Boolean algebra whose identity elements are <1> and S. It 
is remarkable, however, that the postulates for (B do not mention any rela- 
tions on E y whereas our description of S made frequent use of the subset 
relation. In effect, equality in (B is defined implicitly by the four postulates. 
Furthermore, a relation < can be defined on E in terms of the operation V : 
for all Xj y g E 

x < y if and only if x V y = y. 

Observe that this definition is consistent with the result of Exercise 3(i), 

§ 1 . 2 . 

Having recognized that the algebraic system S of all subsets of a given 
set S is a Boolean algebra, we may ask whether the converse is true. Given 
any Boolean algebra (B, does a set S exist such that (B represents the system S 
of all subsets of 5? An affirmative answer to this question was given by 
M. H. Stone in 1936, showing that the abstract system which we call Boolean 
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algebra is in all respects an accurate model of the concrete example of the 
algebra of subsets. 

Exercises 

Following is a list of theorems, statements which are valid for all elements 
w, x, y of any Boolean algebra. Theorems in the second column are dual to 
those in the first, obtained by interchanging V and A, u and z , and reversing 
the relation <. Duality occurs because of the symmetric nature of the 
postulates for a Boolean algebra. Prove as many of these statements as you 
can, using only the postulates or preceding statements in this sequence. , 

1. x V x = x. 1'. x A x = x. 

2. x V u = u. 2'. x A z = z. 

\ 

3. x V (x A y) = x. 3'. x A (x V y) = x. 

4. w V (x V y) = (w V x) V y. 4'. w A (x A y) = (w A x) A y. 

5. x * is unique. 

6. (a;*)* = x . 

7. u* = 0. 7'. 2 * = w. 

8. (x V ?/)* = x* A 2/*. 8'. (x A y)* = x* V i/*. 

9. x < ?/ if and only if x A 2/ = £• 

10. If w < x and x < y, then w < y. 

11. If x < w and y < it, then x V y < w. 

12. If w < x then w < x V y for any y. 

13. x < y if and only if y* < x*. 

§A.7. Groups 

Another important example of a general algebraic system is a group t which 
was mentioned briefly in the discussion of fields in § 1 .6. 

Definition A. 7, A group is any system g = {£?;*} having one closed 
binary operation and satisfying the following postulates. 

Gl. * is associative. 

G2. An identity element i exists in G. 

G3. Each g e G has an inverse g' e (7. 

A group is said to be commutative if and only if 

G4. * is commutative. 
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Examples of Groups 

(a) The integers (or the rational, real, or complex numbers) with addition 
as operation. 

(b) The positive rational (or real) numbers with multiplication as operation. 

(c) The nonzero rational (o.r real or complex) numbers with multiplication 
as operation. 

(d) The mappings of Exercise 4, § 1.5. 

Exercises 

1. The n complex nth roots of unity are the numbers 

2kir . . 2/c7t 7 n . / 

e k = cos h i sin , k = 0. 1. . . . . (n — 1), 

n n 

where i 2 = — 1 . 

(i) Prove that for n = 3 the three cube roots of unity, together with 
multiplication of complex numbers, form a group. 

(ii) Prove the corresponding result for any n. 

2. For any group g prove that 

(i) there is only one identity element, 

(ii) each element has only one inverse, 

(iii) (*')' = 

(iv) '(x * y) r = y' * x\ where * denotes the group operation. 

3. Let g = {G; *} be any group, and let g be a fixed element of G. 

(i) Show that the mapping 

R 0 : x — v x * g 

is a one-to-one mapping of G onto G. 

(ii) Show that the successive mapping is the same mapping 

as Ri*a* 

4. Let {£; *} be a system for which 
* is associative, 

there exists e E S such that x * c = x for all x G S y 
for each x € S there exists Z g S such that x * 1 = e. 

Prove that {£; *} is a group, (e is called a right identity and £ is called a 
right inverse.) 

5. Let {S; *} be a noiwoid system in which * is associative and yx 2 = y 
for every x, y e S. Prove that {5 ; *} is a commutative group. 

6. Consider the real coordinate plane, and let R^ denote the rotation of 
the points of the plane through the angle A around the origin, counterclock- 
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wise if A > 0 and clockwise if >4 <0. The “product” R* * Rb is defined to 
be the transformation which results when R^ is followed by R*. 

(i) Find a simple expression for R^ * Rb. 

(ii) Prove that the system (R = [R\ *} is a commutative group, where 
R is the set of all rotations of the plane around the origin. 

7. In three-dimensional space let X A , Y A , and Z A denote rotations of the 

points of space about the x , y } and z axes, respectively, through the angle A> 
where the positive direction of rotation is chosen to be counterclockwise as 
viewed toward the origin from the positive side of the axis of rotation. As 
in Exercise 6, the product of two rotations is defined to be the transformation 
which results when the first rotation is followed by the second. Show that 
this product is not commutative. \ 

8. Consider the system {E ; O] , where E is the set of three symbols 
{ — , §, /! , and where O is defined in the table below by the rule that x O' y 
is the symbol in the row which is labeled x at the left and in the column 
which is labeled // at the top. 

O / 

§ / 

§ / 

/ - § 

Show that this system is a commutative group. 

9. Discuss whatever similarities and distinctions you can detect between 
the systems of Exercise 8 and the group of Exercise 1 (i). 


§A.8. Homomorphisms and Isomorphisms 

We begin by defining homomorphism in the simple case of two systems, 
S = [E ; *} and S' = {#'; ★), each having one operation. 

Definition A. 8. A mapping H of E into E f is called a homomorphism 
of S into S' if and only if for all a, 6 s E 

(a * b ) li = all * Mi. 

The domain of H is E; whenever the range of H is the full set E\ H is 
called a homomorphism of S onto S'. A homomorphism H of S onto S' 
is called an isomorphism if and only if H is a one-to-one mapping of E 
onto E\ 

Thus a homomorphism is a many-to-one mapping of the elements of one 
system into those of another, having the special property that the operation 
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is preserved by the mapping. By this we mean that the image in E' of the 
♦-product of any two elements of E equals the ♦-product in E’ of the images 
in E f of those two elements of E. Similarly, an isomorphism of S onto S' is a 
one-to-one mapping of E onto E' that preserves the operation. 

arH \ 

Equal f„ 

y (<y * h) H J homoroorphiamtf 

E' 

Figure A. 2 

Now consider the more general case of homomorphism of two systems S 
and S', each of which has m relations and n operations. As before, a homo- 
morphism is a mapping of E into E f that preserves the corresponding rela- 
tions and operations. However, it is not immediately clear what is meant by 
“corresponding.” Suppose we make a one-to-one pairing of the relations of S 
with the relations of S', and a one-to-one pairing of the operations of S with 
the operations of S'; say, r x ■*->- r' i} i = 1, . . . , m, and o } o' h j = 1. . . . , n. 

Definition A.9. A homomorphism of S into S' is a mapping H of E into 
E' tfuch that 

(a) if a r t b in E, then all r[ bH in E' y i = 1, . . . , m, 

(b) (a Oj 6)H = aH o\ bH for all a, b e E } j = 1 , . . . , n. 

If E f = range H, then H is called a homomorphism of S onto S'. A 
homomorphism H of S onto S' is called an isomorphism if and only if H 
is a one-to-one mapping of E onto E'. 

Although the notation of (b), above, implicitly assumes that all operations 
are binary, it is not difficult to formulate a corresponding statement for a 
homomorphism of systems which have n-ary operations. 



Exercises 

1. Determine whether each of the following mappings is a homomorphism, 
an isomorphism, or neither. 

(i) The mapping m — 2m of the additive group of integers into 
the additive group of even integers. 

(ii) The mapping a — 1/a of the multiplicative group of nonzero 
real numbers into the additive group of real numbers. 
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(iii) The mapping a — *- 1/a 2 of the multiplicative group of nonzero 
real numbers into the multiplicative group of positive real numbers. 

2. Referring to Exercise 9, § 1.5 show by the mapping 

(a, b) — a + ib 

that {C; +, • } is isomorphic to the field of complex numbers. 

3. Show that the additive group of real numbers is mapped homomor- 
phically onto the multiplicative group of all complex numbers which lie on 
the unit circle by the mapping 

x — cos x + i sin x. 

4. Let g == {G; *} be any group, and let 0 be the additive group of the 

integers. For any fixed g e G show that the mapping \ 

n — g * g * g * ••• * g (n times) ' 

is a homomorphism of 0 into g. Is this mapping a homomorphism of onto 9? 

5. Let H be a homomorphism of S = { E ; *} onto S' = {£"; ★}. 

(i) Prove that if S has an identity element i, then S' must have an 
identity element i', and that i' = iH. 

(ii) Prove that if a e E has an inverse b y then aH has an inverse which 
must be bH. 

6. Is the additive group of all real numbers isomorphic to the multiplica- 
tive group of all positive real numbers? Explain. 

7. Refer to Exercise 3, § A.7, where the mappings R ff are defined for 
any group g = {G; *}. The product R fll O R,;, of two such mappings is 
defined as the mapping which results when R pl is followed by R PJ . Let 

= {R g \gsCf}. 

(i) Show that {R 0 \ G} is a group. 

(ii) Show that the mapping g — >- R 0 is an isomorphism of {G; *} onto 
{/?©; O}, thus proving Cayley's Theorem: Every group is isomorphic to a 
transformation group. 


§A.9. Equivalence Relations and Partitions 

Definition A. 10. An equivalence relation on a set A is a binary relation 
R in A which is reflexive, symmetric, and transitive. 

(a) Reflexive : a R a for every ael 

(b) Symmetric: If a R 6, then b R a. 

(c) Transitive: If a R b and b R c, then a R c. 

It follows from (a) that dom R * A « range R. 
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We now consider the effect imposed by an equivalence relation R on a non- 
void set A. For each a e A let [a]R denote the set of all elements which are 
related to a by R: 

[o)r = {x e A | x R a}. 

The set [o]r is called the equivalence class determined by a. Since R is reflexive, 
aRa, so a e [a] R . This implies that the set union of all the equivalence 
classes equals A. If b g [o]r, then b R a, and a R b since R is symmetric. 
Hence if b e [a] R , then a e [b] R . Suppose also that y e [6] R . Then y R b 
and 6Ra, so transitivity implies y R a; hence y e [a] R , and [b] H Q [a] R . 
By reversing the roles of a and b we obtain [a] R Q [fr] R , so [b] R = [a] R when- 
ever b e [o] R . This implies that two equivalence classes arc cither equal or 
have no elements in common. 

Therefore, an equivalence relation R on a set A decomposes A into disjoint 
subsets, called equivalence classes. Such a decomposition of a set is called a 
partition. 

Definition A. 11. A partition of a set A is a collection (P of subsets of A, 
called classes of the partition, such that 

(a) if x e A, then x e C for some C e (P, and 

(b) if C y D g CP, then either C = D or CnD = $. 

Now suppose we reverse the situation and start with any partition (P of A. 
A relation R can be defined on A by writing a R b if and only if a and b are 
in the same class of CP. It can be proved as an exercise that R is an equiv- 
alence relation on A for which the equivalence classes are the classes of (P. 
Therefore, each equivalence relation on A determines a partition of A, and, 
conversely, each partition determines an equivalence relation. 

Exercises 

1. Given a symmetric and transitive relation R in a set A. Is R reflexive? 
Distinguish between the cases dom R = A and dom R cz A. 

2. Consider the points of a plane P as described in a rectangular coordinate 
system by pairs of real numbers, and let m be a fixed real number. Define 
on P the relation M as follows: 

(xi, 2 /i)M(x 2 , 2 / 2 ) if and only if yi - 2/2 = m(x 1 - x 2 ). 

(i) Prove that M is an equivalence relation. 

(ii) Describe geometrically the equivalence classes of M. 

3. Given a partition (P of a set A. Show, as indicated in the text, that a 
corresponding equivalence relation R can be defined on A such that the 
equivalence classes of R are the classes of the partition <P. 
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4. Let E(n) denote the number of different partitions which can be defined 
on a set of n identical objects. 

(i) Evaluate E(n) for n = 1, 2, 3, 4. 

(ii) Relate E(n) to the number of different ways the integer n may be 
written as the sum of positive integers in nonincreasing order, for example, 

5 = 3 + 2 = 3 + 1 + 1 = 4 + 1 and so on. 

§A.10. Cosets 

We conclude this discussion of general algebraic concepts with one further 
notion of widespread applicability in the study of algebraic systems. \n 
doing so we shall discover an intimate connection between mappings, equiv- 
alence relations, homomorphisms, and isomorphisms. 

Let R be a relation from A into B, and suppose A = dom R. We first 
show how R can be used to define a relation K in A. Each a e A is related 
to one or more b e B. Another element x e A might also be related to one 
or more of those same elements of B. With this fact in mind we make the 
following definition. 

Definition A. 12. The R-coseJ of a g A is the set 

(a) K = {x e A | x R b and a R b for some b e B}. 

We immediately deduce from the definition that 

a e (a) R 

and 

x € (a)n if and only if a e (x)r. 

A relation K, which we call the relation induced in A by the K-cosets, is 
defined by writing 

xRa if and only if x e (o) R . 

It follows that dom R = A = range R, and that R is both reflexive and 
symmetric. In the example illustrated in Figure A.l, we observe that (aifo = 
{ai, 02} , (02)11 = {fli, 02, o 3 } , and (o 3 )r = {02, o 3 } . This shows that R is not 
always transitive. 

However, suppose that the original relation R is actually a mapping F, so 
that each o e A has a unique image oF e B. Then the F-coset of a is 

(o)f = {x g A | xF = oF} . 

Therefore, the relation F induced in A by the F-cosets is characterized by 
the statement 
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x F a if and only if rF = aF. 

We now prove that this reflexive and symmetric relation is also transitive. 
Let y F x and iFo; then y¥ = rF = aF, so y F a. Hence F is an equivalence 
relation. Furthermore, the F-equivalence class which contains a is simply 
the F-coset of a, for 

[a]p = (x e A | .r F a] = {r e A | .rF = aF] = (a) v . 

These results are summarized and an additional fact asserted in the following 
theorem. 

Theorem A.l. Let R be a relation from A into B for which A = dom R, 
and let K be the relation induced on A by the R-cosets of A. Then R 
is reflexive and symmetric. If R is a mapping F, then F is an equiv- 
alence relation whose equivalence classes are the F-cosets. Let T be the 
collection of these equivalence classes; there exist a mapping K from A 
onto li and a one-to-one mapping J of A onto range F C B such that 

F = KJ 

pit oof: To verify the last sentence of the theorem we define the 
mapping K from A onto T as follows: 

K: a — y- [a]p. 

K maps each a e dom F into the equivalence class (F-coset) of all 
elements x for which .rF = aF. The mapping J of onto range F is 
defined by 

J : [a] p — ►- aF. 

It is necessary now to consider an important technical point : the descrip- 
tion just given as a definition of the mapping J is open to criticism, since 
the J-image of the cqui valence class [a]p was specified as aF, the F-image 
of one of the members of [a]j>. At first glance it appears that this defini- 
tion might be ambiguous, because different members of [a]? might have 
different F-images in B. But x e [a] p if and only if .rF = aF, which 
shows that all members of [a]p do have the same F-image. Hence, J is 
properly defined, after all. 

To prove that J is one-to-one, suppose [a]rJ = [b] pj. Then aF = 6F, 
so b e la]p. But two equivalence classes which are not disjoint must 
be equal, so [a]p = [b] f. Finally, aK = [a]F, and aKJ = [a]Fj = aF, 
so KJ = F. 

The following diagram might help to fix in mind the essential points of this 
theorem. 
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3-Rtf 

dom Y = A range Y £ B 

Figure A. 3 

This analysis of the role of cosets in the theory of mappings takes on special 
significance when it is applied to homomorphisms of abstract systems. Let 
11 be a homomorphism of S = { E ; *) onto S' = [E'\ *) . From the theorem 
just proved, we know that the H-cosets are equivalence classes which form a 
partition of E. Let 7? be the collection of the H-cosets of E. The mapping H 
can be represented as the successive mappings KJ, where K maps each 
x E E into its II-coset, Mm and where J is the one-to-one mapping of 2? 
onto E' which assigns to [x]n the image .rH: 

•r Mi. *H. 

But much more can be said; because of the operation which is defined on 
E } it is possible to define an operation • on E by the rule 

Mu • [yhi = [x * y\ h. 

It is necessary again to prove that the definition is not ambiguous; that is, to 
prove that the “product” of two cosets is independent of the choice of rep- 
resentative elements. Suppose a e Mn and b £ Mh- Then aH = xH and 
bH = yll. Since H is a homomorphism, 

(< a * b) H = aH * bH = jH * yH = (x * y) H. 

Hence, a *b e [a- * ?/]h, and finally [a *b] h = [x * y] h because the H-cosets 
form a partition of E. Therefore, the operation • is properly defined on 2?, 
and s = {E ; • } is an abstract system. 

Now consider the mapping K of E onto E. We have 

(.r * y ) K = [x * y]u = Mn • Mn - xK • yK. 
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Hence K is a homomorphism of S onto 5. Likewise, the one-to-one mapping 
J of 2? onto E' preserves the corresponding operation: 

(Wh • Mh)J = fa* * Z/]hJ = (x * y ) H = :rll ★ y\\ = [ k) ] H J ★ [y] H J. 
Therefore J is an isomorphism of the systems $ and S'. 

Fundamental Isomorphism Theorem. Let Ii he a homomorphism 
of the system S = { E ; *] onto the system C S' = {£'; *] . Then a product 
• can be defined on the set 7? of H-cosets such that 

(a) there exists a mapping K which is a homomorphism of the system S 
onto the system C S = { E ; • } , 

(b) there exists a mapping J which is an isomorphism of the system S 
onto the system S', 

(c) H = KJ. 

Finally, we remark that no specific properties of any of the systems were 
assumed, and therefore the preceding theorem is extremely general. In various 
applications the system C S of cosets is called the factor-system , quotient-system , 
or difference-system of S. 

Exercises 

1. Describe the R-cosets of each of the following relations. 

(i) The order relation < for real numbers. 

(ii) The relation “ a divides 6” for positive integers. 

(iii) The relation “Li is perpendicular to L 2 ” for lines of the plane. 

2. Let S be the set of all humans living at a given instant, and let F be 
the age function, sF being the age in years of s on his most recent birthday. 
Describe the F-cosets. 

3. Refer to the relation R of congruence modulo n, as defined for integers 
in Exercise 4, § 1.3. 

(i) Describe the R-cosets. 

(ii) Let I n denote the collection of these cosets, which we denote by 
(0) n , (1)«, . . . , (n — 1 ) n . We define on I n two operations: 

(a) n ® (b) n = (a + 6) n , 

(a)n O (b)n = C ab ) n . 

Verify that these operations are unambiguously defined. 

(iii) Show that the system = {/ n ; 0} is a commutative group. 

(iv) Show that the system $ n = {In) 0, O} is a field if and only if n 
is prime. 
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(v) Show that the mapping H of the integers onto is a homomor- 
phism, where H is defined by 

xH = [a]„ if and only if x s a(mod n), a = 0, . . . , n — 1. 

4. Let 8 = {E; *} be a group whose identity is denoted i, and let H be a 
homomorphism of 8 onto S' = {E'\ ★] . From Exercise 5, § A.8 wc know that 
iH is the identity of S'. The H-coset (i)n is called the kernel of H. Prove that 
H is an isomorphism if and only if the kernel of H has i as its only element. 

5. Let H be a homomorphism of a field 5 into a field and suppose that 
oH ^ 0' for some a e 5. Prove that H is an isomorphism. 
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Matrix Notation 


A student beginning his study of linear algebra might find it disconcerting 
to discover that no two books on the subject seem to use the same notation. 
Some use Greek letters for scalars and Latin letters for vectors; others reverse 
the choice. Some use bold face type for vectors; others use it for mappings. 
But more basic than these simple typographical differences, some authors 
write the symbol for a mapping to the left of the symbol for the element 
that is mapped, others write it to the right. This lack of agreement among 
mathematicians about where to write symbols for mappings is a potential 
source of confusion for the inexperienced reader, and the purpose of this 
brief appendix is to reduce the confusion by examining the two most common 
conventions and their consequences in matrix algebra. 

We begin by recognizing that a matrix can represent various mathematical 
objects, each in accordance with some notational convention. Different con- 
ventions for representing a given mathematical object by a matrix - usually 
lead to different forms of the same matrix theorem. But it is the mathematical 
object itself, rather than its matrix representation, that attracts our interest. 
For this reason, algebraic operations for matrices are defined in such a way 
that the algebra of matrices reflects the algebra of the objects thus represented. 


§B.l. Mappings 

The most useful mathematical concept frequently represented by a matrix 
is a linear mapping, or a vector space homomorphism. If a is a vector and 
T a mapping, the image of a under T is usually denoted either by Ta or 
by oT. The latter notation, in which T is written to the right of a is called 
right-hand notation; the former, with T to the left of a is called lejt-hand 



270 Appendix B 


notation. The image of a under the composite mapping, T followed by S, 
is denoted by aTS in right-hand notation and by STa in left-hand notation. 
Of course, both symbols have exactly the same meaning: the vector a is 
mapped by T, and then the resulting vector is mapped by S. Thus in right- 
hand notation aTS is read from left to right; in left-hand notation STa is read 
from right to left. Both describe the same sequence of events. 


§B-2. Linear Transformations 

Suppose that T is a vector space homomorphism from T) m to *W n . As 
described in §4.1, in order to represent T by a matrix we choose a basis 
{aj, a 2 , . . . , a m } for V m and a basis {&, ft?, . . , /3„} for e W r „ and we express 
the image of each vector of the a-basis as a linear combination of the vector^ 
of the 0 -basis: 

aiT = an/?! + CI12&2 • • • ~b a\ n ( 3 n = Taj 


a t T = a,]/?! + a t2 p 2 + — + a in 0 n = Ta t 


a m T = a 7n i0i + a m2 /3 2 + • • • + a mn fi n = Ta m 

In right-hand notation T is represented by the m X n matrix A of coefficients 
of the /3 ; - in the array of these equations. 

jan Ul2 * * * «ln 

(B.l- R) T — ►- A = | a,i a a2 • • • a t „ 

^mi a m 2 ■ • • a mn 

Each row of the representative matrix A specifies the coefficients of the 
image of the corresponding basis vector: 

(B.£-R) a»T = 23 a t$j- 

i- 1 

A vector { = s V m is represented by the row vector X = 

(xi, . . . , x n ). Then {T is represented by the row vector XA. If a linear map- 
ping S from \V» to is represented by a matrix. C relative to the /3-basis for 
and a 7 -basis for <y p , then TS is represented by the matrix product AC. 
Similarly, {TS is represented by XAC . 

In left-hand notation T is represented by the n X m matrix B of coefficients 
of the fij in the transposed array of the equations. 
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(B.l-L) 


T — *- B = 


• 0,1 • 

' 0j2 * 


0ml 
a m2 


\ 

i &In ’ ’ * * * * Ct mn 


Thus b JX = a,/, and B = A'. Each column of the representative matrix B 
specifies the coefficients of the image of the corresponding basis vector: 


(B.2- L) Ta, = L bjfij. 

j - i 

The vector { = x t a t is represented by the column vector U, where 



Then T£ is represented by the matrix product BU. We note that 

BU = A'X' = (AM)'; 

thus the column vector BU which represents Tf in left-hand notation is the 
transpose of the row vector which represents £T in right-hand notation. The 
transformation S from W n to ‘yp will be represented by a p X n matrix 
D = C r . Thus ST is represented by DB = C'A' = (AC)', and ST£ is rep- 
resented by DBU = C'A'X' = (XAC) f . 

In short, the matrix representation for vectors and linear transformations 
using left-hand notation is the transpose of the matrix representation ob- 
tained using right-hand notation. Furthermore, in both systems of repre- 
sentation the algebra of matrices reflects that of linear transformations. 

These observations provide a direct method by which a statement about 
matrices which expresses a fact about linear transformations in cither system 
of notation can be translated into a matrix statement which expresses the 
same result in the other system. For example, consider matrix equivalence, 
which in right-hand notation is expressed in § 0.5 by 

(JJ.S-R) A = PCQ~\ 

Then 

A' = = (Q')- l C'P f . 

But each linear transformation is represented by a certain matrix in one 
system of notation and by the transpose of that matrix in the other system. 
Hence A and A' represent T, C and C' represent T relative to another pair 
of bases, P and P' represent the change of basis in *0 m , and Q and Q' rep- 
resent the change of basis in *W n . If we let B = A', D = C', R = P', and 
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S = Q\ then we obtain a characterization of matrix equivalence, expressed 
in left-hand notation: matrices B and D are equivalent if and only if there 
exist nonsingular matrices R and S such that 
(B.3-L) B = S-'DR. 

Similarity takes the form 

(B. 4 -R) A = PCP~ l in right-hand notation, 

(B./+- L) B = R~ l DR in left-hand notation. 

§B.3. Systems of Linear Equations 

Now consider a system of linear equations of the form (5.1): \ 

a U Xi + CLnXo 4* • • • + ClinXn = yi 
O 21^1 + 0 , 22 X 2 + • • • + 0‘2nX n = V2 


a m iX 1 + a m2 x 2 + • • • + a mn x n = y m 

In order to represent this system by the matrix A of coefficients arranged 
precisely in the array given by the equations themselves, we use left-hand 
notation: 

(6.5- L) AX = F, 

where X and F are column vectors representing and rj e V m , and 

where A represents a linear transformation T from to tU: 

T Pi = L 

j-i 

TJ = 7 7 . 

(The definitions of T, £, 17 , X, and F are different from those in § B.2.) But the 
system can also be expressed by 

(6.5- R) X'A' = F', 

where X ' and F' are row vectors. Now in right-hand notation A f represents 
T, X' and F' represent { and r/, and £T = 17 . 


§B.4. Conjugate Bilinear Functions 

Finally, consider the matrix representation of a conjugate bilinear function 
/, as discussed in §8.1. In either left-hand or right-hand notation, / is rep- 
resented relative to bases Ja,} for *o w and {&} for *W n by the matrix A where 



Matrix Notation 273 

flij = /(« t) ft)- In right-hand notation, vectors £ and ij are represented by 
row vectors X and Y, and 

(B.6-1 l) /(£,u) = XAF'. 

Conjunctivity of A and C is expressed by 

(B.7-R) A = PC?', 

and congruence is expressed by 

(B.8-U) A = PCP'. 

In left-hand notation, £ and tj are represented by column vectors U and 
V, and. 

(B.6-1) /(£,’?) - WAV. 

Conjunctivity of A and C is expressed by 
(B.7- L) A = P'CT, 

and congruence is expressed by 

(B.8- L) A = P'CP. 

Sometimes a further modification is employed for left-hand notation by 
defining a conjugate bilinear function to be conjugate linear in the first 
component and linear in the second. With that convention we have 

m. i ) - wav, 

and conjunctivity becomes 

A = V'CP. 

Congruence remains in the form 

A = P'CP. 

You are strongly encouraged to verify the statements of this Appendix by 
direct computation, observing carefully how and where the distinctions arise, 
yet discerning that corresponding matrix statements have precisely the same 
meaning when interpreted as statements about mappings. 
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Suggestions and Answers 
for Selected Exercises 


CHAPTER 1 

§ 1.1. 1. (i) Let 3C represent the club, and let x and y represent any two 

distinct members of X. By (b) there is a committee C to which x and y 
belong. By (c) there is a member z of X who does not serve on C. Now 
finish the proof by using (b) again. 

(ii) Let C be any committee and let x be any member of C. Use 
(c) and (d) to obtain a student y not on C, and a committee Ci which 
has no members in common with C. By (b) x and y serve on a uniquely 
determined committee C 2 , and by (c) some student z does not serve on 
C 2 . Then y and z determine a committee Cj. Now consider the pos- 
sibilities of C and C 3 having a member in common; to complete the 
proof you will have to construct a C 4 also. 

§ 1.2. 1. (i) There are 10 distinct subsets. 

(ii) 2 m . 

4. (iii) No. 

5. (iii) Yes. 

(v) Yes. 

7. n{A U B U C) = »(A) + n(B) + n(C) - n(A 0 B) — n(A fl C) 

-n(B n C)+n(A fl B DC). 

§ 1.3. 1. (ii) There are seven pairs in the subset. 

(iii) There are six pairs in the subset. 

2. S X S has at least sixteen elements; hence at least 2“ subsets. 

§ 1.4. 1. (i) dom FG = dom F = S; range FG C range G C 17. 

§ 1.5. 1. (i) Show that if both e and i are identities relative to *, then 
e - t. 

(ii) Let I and x' be inverses of x, and consider 2 * (x * x'). 


279 
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3. (i) Yes. 

(ii) Yes. 

(iii) No. 

§ 1 . 6 . 1 . (i) Consider a O (o © t). 

(ii) Consider (a_ © b) ® (a O 6 ), and use (i). 

(iii) Consider b' O [ 6 _ O (&_)'], and use (ii). 

3. Show that the operations are closed and that the field postulates 
are satisfied. 

CHAPTER 2 

§ 2.1. 2 . Only (v) and (x) form groups. 

§ 2 . 2 . 2 . The space of all real n-tuples is isomorphic to the space of all r^al 
polynomials of degree not exceeding n — 1 . 

5. No. Why not? 

6 . Yes; yes; no; yes; no. 

§ 2.3. 1. Yes; no; yes; no; yes; yes; yes; no; yes; yes. 

2. (iii) Not a group. 

4. S n 3 c [(-3, 1,5)]. 

6 . Let { g ((R + S) fl 3 and write £ =» p + <r, where p G 61, and 
<r G S. Prove that £ G ((R 0 3) + (S fl 3), and finally use the result of 
Exercise 5. 

§ 2.4. 2 . The number of vectors in a maximal linearly independent subset 
for each part is 

(i) three; 

(ii) four; 

(iii) two. 

§ 2.5. 5. ci = — a2, €2 = ot2 — a*, €3 =» a* — <* 4 , €4 = a*. 

§ 2 . 6 . 2 . (a) n. 

(b) 0 . 

(c) 1 . 

(d) n+ 1 . 

(e) Infinite. 

4. (i) Choose a basis {ai, . . . , a*} for S; extend to a basis {a h . . . , a n } 
for *0, and let 3 = [a*+i, . . . , a*]. 

5. Use Theorem 2.14 and Exercise 5, § 2.3. 

§ 2.7. 1 . Map Oox n + aix ,n_1 + • • • + a n onto the (n + l)-tuple 

(ao, a \ y . . . , a n ). 

7. (i) Yes; no. 

(iii) Yes; yes; no. 
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CHAPTER 3 

§3,1. 4. Writ e0 = a+(— a). 

8. Only (ii) and (iv) are linear. 

10. (ii) No. 

(iii) Show that MD — DM maps any polynomial into itself. 

(iv) Consider D(MD - DM)M. 

§ 3.2. 1. If £ G citrs, then £ = r;TS for some rj g V. Hence £ = aS, where 
a = ijT G W. Hence £ G 61s. 

3. (i) Show that (ttx+s Q (Ut + (Rs and then use Theorem 2.13. 

(iii) Choose a basis {a i} . . . , a*] for 9l T ; extend to a basis 

{an, . . . , «m} for ntrs, and show lhat [m+iT a^T) is a basis for 

the space OZtsT. Finally, observe that SJlxsT C and complete the 
proof. 

7. If p(T) = 1, each £ g V is mapped into a scalar multiple of some 
fixed vector a: £T = kta } where A* * depends upon £. Choose c = k a . 

§ 3.3. 3. ad t* be. 

5. (ii) All except DJ. 

(iii) All except D and DJ. 

§ 3.4. 1. Let £ = c(af) _1 a, where a is any vector for which af ^ 0. 

3. (Ji, x> } -r 8 )f« = £(*i + ^2 + J‘3 ~ 2x t ) } i = 1, 2, 3. 

§ 3.5. 3. To show that S' is nonsingular, suppose that /S' = B G for 
some / G \V'] then aSf = 0 for all a G D. Since S is nonsingular, 
fit = 0 for all fi G V, so/ = 6 g *W'. 

m m 

§ 3.6. 1. (i) Let £ = £ cNa. an( J V = Y, 6,^; calculate £?j. 

t « i j ~ i 

2. (i) Calculate (ail + a 2 i + a 3 j + a A k){bi\ + b 2 i + Ihj + b A k). 

(ii) Try (ail — a 2 i — a 3 j — a 4 A') as an inverse of (ail + a 2 i + 
azj + a 4 A-). 

§ 3.7. 1. (iii) (a, b ) T = (—a — b, —a + 6). 

(iv) jS/T = -20, + 20 2 
AT = -lft + 2 A. 

2. (ii) anS = an + 2a 2 

a 2 S = — ai + a 2 . 

(iii) eiS- 1 = (d - 2e 2 )/3 
€ 2 S -1 = (ei + € 2 )/3. 

3. (iii) (a, 6)TS = ( — 26, —3a — 6) 

(a,6)ST = (-3a, a + 26). 

5. (ii) Show that a basis for (Rj together with a basis for 3lx form 
the desired basis for t). 
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CHAPTER 4 


§ 4.1. 3. °), B-(J °), C-(J J). 

4. (i) E tJ has 1 in the (t,j) position and 0 elsewhere. 

§ 4.2. 3. (ii) The (i,j) elements of AD and DA are d Xj djj and respec- 
tively. These will be equal for all choices of the <T s if and only if a X} = 0 
when i 9 * j . 

4. The major problem is to show that the inverse of a nonsingular 
triangular matrix is triangular. Assume that AT = 7, and show that 
if a r * 5 * 0 for r > s, then one of the diagonal elements of T must he 
zero. You may use Exercise 3, § 4.4, to conclude that T must then be 
singular. \ 

8-12. Use Exercise 7. \ 


n -2 

13. (ii)- A = I 1 1 

\0 1 


(iii) E = 1 




§ 4.3. 2. (i) A reflection across the x axis. 

(iii) A reflection across the line y = x, followed by a projection 
onto the y axis. 


3. 


(i) Let A = (* 


l A 
d 


be idempotent. Show that 


(a - d)(a + d - 1) = 0, 

and consider separately the two cases which arise from this equation. 

(ii) As in (i), show that if A is nilpotent, then (a — d)(a + d) = 0, 

and consider separately the two cases which arise from this equation. 

§ 4.4. 3. Determine necessary and sufficient conditions that the row vectors 
form a linearly independent set. 

4. To prove the assertion concerning dimension, choose a basis for 
V n having f as 'its first vector. Then consider the effect on £ of the linear 
transformations T„ as defined in the proof of Theorem 3.16. 

5. Use Exercise 3, § 3.2. 

8. A Markov matrix can be singular. 

9. Yes. 

11. (iii) /, X, Y, iZ , -/, -X, — Y, -zZ. 

12. (ii) Show that L(vi)L(v 2 ) = L( v')> where 


(pi + Vi )c 2 
V\V 2 + c 2 ' 



Suggestions and Answers 283 

§ 4.5. 3. (i) The space spanned by {<*i, . . . , a*} is T-invariant, as defined 
in Exercise 7, § 3.3. 

CHAPTER 5 

§ 5.1. 1 . Interpret the system (5.3) as defining a linear transformation T 
from *W„ to V m . Then p(A') = p(T). 

3. Xi = — 1 — 3x< + xt 
x 2 = —1 — 3 x 4 

x% = 1 + x 4 — 2 x 6 

X 4 and Xi arbitrary. 

4. Xi = — 1 , x 2 = — 2 , x 3 = 4. Solution is unique. 

5. No solution exists. 

^ = 0 , then det^ e \ = 0 and 

§ 5.3. 2. If t y* k, )T a u\Ak,\ is the expression for expanding a determinant 
j-i 

for which row i and row k are identical. 

4. Consider the various ways in which a nonzero product of n terms 
can be formed, one from each row and each column. 

G. (ii) det V is a polynomial of degree n — 1 in each x„ which has 
the value zero if x, = x, for i y* j. Hence (x> — x.) is a factor for each 
j > i, and det V = k II (x, — x,), where k is a constant, perhaps 

1 <i <] <n 

depending upon n. Prove that k = 1 . 

§ 5.4. 1 . xi = 2, x 2 = —2, xj = — 1 . 

4. Use Theorem 5. 12 . 

5. x = 1, 2, 3. 

6 . Use Exercise 6 , § 5.3. 

§ 5.5. 2 . Multiply row i of B by a,j and expand by Theorem 5 . 5 (b) to obtain 
2" -1 determinants, all but one of which are zero. 

3. (i) det A = -29 

(ii) detB = — 11 

(iii) det C = — 

4. Expand det B as the sum of 2" -1 determinants, all but » of which 
are zero; then apply Theorem 5.9. 

5. (i) Use Exercise 4, pivoting on d. 


§ 5.2. 3. (i) If det(® 
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CHAPTER 6 


§ 6.1. 3. (iii) A canonical form for m X n matrices of rank k is 



§6.2. 3. Consider the number of linearly independent rows of each type 
of matrix. 


6 . 

7. dct A tJ = 1 = — det P tJ ; dct M t (c ) = c. 

/-2 6 4\ 

§6.3. 1. (i) i ( 1 -3 2 . 

\ 1 5 2/ 



\ 

i 


2. Convert each to reduced echelon form and compare. 

7. (i) The reduced echelon matrix E which is row equivalent to A 

has rank m. Hence there are m columns which have zero in all posi- 
tions except one; these nonzero elements all equal 1 and appear in 
m distinct rows. 


§ 6.4. 1. See the answer for Exercise 1, § 6.3. 

4. All three matrices are of rank 2. 

5. First observe that PAP' is symmetric if A is symmetric. If a t j ^ 0, 
then A tJ A(A l} )' has the element 2 a t} in the (j, j) position. A permutation 
of rows 1 and j and a like permutation of columns places 2a,, in the 
(1, 1) position. Multiplication of row 1 and column 1 by (2 a i; )“ 1/2 pro- 
duces 1 in the (1, 1) position. Row operations and corresponding column 
operations then produce zeros in the remaining positions of the first 
column and the first row. 

6. (i) Yes. 

(ii) Not necessarily. 

(iii) Not necessarily, if both A and B are singular. Yes, if either 
A or B is nonsingular. 


( ezi + ezt + ezz 0 

0 ezi + z 2 + e 2 z s 

0 0 


s ) 
ez\ + e 2 Zi + z 3 / 



7. Use the results of Exercise 5, § 3.7. 
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CHAPTER 7 


§ 7.1. 1. (i) Xl = -3, X! = (2a, -3a) 

X2 = 2, X 2 = (a, a). 

(ii) Xi = — 1 = X 2 ; two linearly independent characteristic vec- 
tors can be chosen. 

X 3 = 8, X 3 = (2a, a, 2a). 

(iii) Xi = 1 = X 2 = X :J , X 1 = (a, 0, -2a, 0) 

X 4 = 3, X 4 = (a, 0, 0, 0). 

10. M and M\ have the same characteristic values. If AM/' = XA r , 
let m = max |x»|, and show that |Xr,| < m for all i. 

§7.2. 1. (i) Xi = —2, X 2 = 1, X 8 = 3. A suitable diagonalizing matrix is 



(ii) Xi = 3 = X 2 , X 3 = 12. A suitable diagonalizing matrix is 



(iii) Xi = 1 = X 2 , X 3 = 2. Since the characteristic vectors associated 
with the repeated characteristic value span only the one-dimensional 
space [(5, 2, —5)], no diagonalizing matrix exists. 

2. (i) and (ii) are similar to diagonal matrices; (iii) is not. 

3. Use Exercise 8, § 7.1. 

4. Use Exercise 7, § 7.1. 

5. (i) Consider det(A — X/), substituting each suggested value. 

(ii) Use column operations on det(A — X/). 

6. (ii) Using D as defined in (i), let C = and apply the 

results of (i). 

8. An example which proves the assertion is given by 


0 1 0 \ /0 1 0 

A = 0 0 lj, £ = ( 0 0 0 

0 0 0 / \0 0 0 


§ 7.3. 6. (i) Let A = (^ ^ and consider separately the cases a + d ~ 0 


and a + d 9 * 0. 

(ii) If A is real and 3 X 3, the characteristic polynomial of A is 
of third degree with real coefficients; hence it has at least one real zero. 
Thus A 2 has at least one nonnegative characteristic value. This implies 
that A 2 7* — I. 
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§7.4.1. (i) 3R - [(1, 1)], 31 =[(-1,1)], Si = am 

<«m (; j). 

(m) B - (o -?)• 

« p -(! -!)■ 

5. (iv) E, is represented by a matrix which consists of zeros except 
for a square .block I of size equal to the dimension of (Re., beginning in 
the (fc + 1, k + 1) position, where k = dim <Re» + • • • + dim (Re,^. 

§ 7.5. 1. (i) The minimal polynomial has distinct zeros. • 

(ii) The characteristic vectors do not span U 3 . ■ 

(iii) The characteristic vectors do not span 1 ) 3 . 

(iv) The characteristic polynomial has distinct zeros. 

2. A is similar to a diagonal matrix, which can be shown to be B. 
5. Compute E } A , and dcduce'that A — aj is singular. Any nonzero 
vector of the form YE 3 is characteristic. 

§ 7.6. 2. (i) p(A) = 2; 91* = [(0, 0, 1)]; A is nilpotent of index 3. 

(ii) {£ 1 , £iT, fiT 2 ] is a basis. 

(iii) Relative to the basis of (ii), T is represented by the matrix 



Hence k = 1 and pi = 2. 

3. (i) By direct calculation, AN = NA if and only if a l(J _ 1 = ai+i.y 
for i < n and j > 1, a nj = 0 if j < rz, and a,i = 0 if i > 1. 

(ii) The only characteristic value of A is a h and the characteristic 
vectors can be shown to be those of the form (0, 0, ... , 0, x n ). 

(iii) Under the hypotheses, direct computation shows that the 
characteristic vectors are those of the form (0, . . . , 0, x n _jt+i, . . . , x n ). 

4. The characteristic polynomial of a nilpotent matrix must be 
(— 1)"X B ; hence the conditions are c„i = c „2 = • • • = c nn = 0. 

5. If £ e (Rt»-*, then = 0. Hence (Rtp * Q 91 t*. As in Theorem 3.5, 

[0] £ (Rtp-« C • • ■ C (Rx £ u 

[0] £ 9ix £ • ■ • c ai T ,-i c v. 

Prove by induction that if (Rt»-i = 9lx, then (Rx*-* = 9lx*. Then use 
Theorem 3.4 to show that if (R T p-i 5 * 9lx, then (Rx 5 ^ 91t*-». 

§ 7.7. 2. Find the Jordan form of A by determining whether A is similar 
to a diagonal matrix. 

3. If A is not similar to a diagonal matrix, the two characteristic 
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values of A must coincide, and the characteristic vectors must span 
a one-dimensional space. Necessary and sufficient conditions are 
(a — d ) 2 * + 46c = 0 and 6 2 + c 2 > 0. 

1 

4. (i) ( 0 1 

0 


(1 1 0 \ 

(i) 1 0 1 0). 

\0 0 2 / 



(ii) 


I 1 1 °\ 

(iii) 0 1 0 . 

\0 0 - 1 / 

6. Let AJ = /; for fixed i, let k be the smallest integer such that 
7 * 0. Then 8 tk = a lk \ k ^ 0, so k = i. A 3 X 3 example shows that 

A need not be in Jordan form. 

7. (iv) Segre characteristic: ((4, 2)(4)(1, 1)}. 

(v) Characteristic polynomial: (X — 2) r, (X) 4 * * 7 8 9 10 (X — l) 2 . 

(vi) Minimal polynomial: (X — 2) 4 (X) 4 (X — l). 


7.8. 


(vii) Elementary 

divisors: (X — 2) 4 , (X - 

2) 2 , 

(X - 

- 1). 


/ 2 

-6 

2 

-4\ 

/l 

1 

1 

0 

43 _ f 0 

2 

0 

2 1 yl->- 


1 

0 

1 

A 2 

-2 

-2 

0 )’ A 

l 1 

1 

-1 

0 

\0 

2 

0 

-2/ 

\o 

1 

0 

-1 


2. Consider the Jordan form of A. 

5. (i) Yes. 

(ii) No. 

6. (i) Recall that a projection is idempotent, and hence its charac- 

teristic values are zero or one. 

7. Use Exercise 6, § 7.G. 

8. H n AHn l can be obtained from A by rotating each entry 180° about 
the center of A. 

9. A is similar to a matrix J in Jordan form; let J have t diagonal 
sub-blocks. Let P — diag(#i, . . . , B t ) where each B t is of the form H n% 
of Exercise 8, and where n l is the size of the tth sub-block of J. Then 
PJP- 1 = J\ 

10. First show that A 2 = (tr^4)i4 
CHAPTER 8 


, 8 . 1 . 2 . (i) (_j 2 ) and (2 4 } 

(ii) Yes. 
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9. Begin by moving a nonzero element a l} into the (1, 2) position 
by means of congruence operations. Then multiply row 1 and column 1 
by — aij 1 . This produces a canonical block in the upper left. Reduce 
to zero the remaining elements of the first two rows and columns. 


10. The canonical form has two diagonal blocks of the form 



and all other elements are zero. 


11. (i) One correct answer is \ 



/ 1 0 

(ii) 0 -1 

\o 0 

/ 1 0 

(iii) 0 1 

\o 0 

/ 1 0 °\ 

12. I 0 1 0 ) 

\0 0 - 1 / 


•> 


§ 8.2. 5. If £ = krj , direct calculations show that the Schwarz inequality re- 
duces to equality. Conversely, if |p(£, 77) [ 2 = p(£, £)p(tj, v)j let k = — 

Pv7> VJ 

tod show that p(£ + krj, £ + krj) = 0. 


§ 8.3, 1. (i) Expand p(£ + 77 , £ + *?)• 

(ii) Expand p(£ - 77, £ - 17). 

(iii) Expand p(£ + v, £ - y). 

5. Observe that || a + cfi || 2 = ||a|| 2 + c 2 ||/3|| 2 + 2 cp(a, 0). 

7. Choose a normal orthogonal basis for S; extend to a normal orthog- 
onal basis for V. 

9. (ii) (S + 3) x is orthogonal to S and hence is a subspace of S 1 . 
Likewise, (S + 3) 1 is a subspace of 3 1 , and hence a subspacc of S 1 fl 3 X . 
The reverse relation can be established without difficulty. 

11. Compute £T 2 and compare with £T. 


1.4. 1. 



cos ¥ 
—sin ¥ 


3. (i) B 


l; 


sin >k\ 
cos 47 



4. (i) a 2 + b 2 = 1, c 2 + d 2 = 1, ac + 6d = 0 = ab + cd. 

7. If an isometry T is represented by a Jordan matrix J relative to 
the basis {a,}, then a t T = X,«, + ka t + 1, where k = 0 or 1. Deduce that 
k « 0, and therefore J is diagonal. 
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9. Consider [(/ - K)(I + JQ-»][(/ - K)(I + X)" 1 ]', and observe 
that I + K and / — K commute. 


§ 8.5. 1 . (i) Complete the square. 

2. (i) r = 3, s = 3. 


(ii) r = 3, s = — 1. 

(iii) r = 3, s = 1. 

3. Let z = XKX*. Show that l = z and that z must be real. 

7. There are three types of rank three, and two of rank two. 

8. (ii) If A is nonsingular, p(B 0 ) = n. XBqX' > 0, for all X. The 
canonical matrix which is congruent to Bq must be /. Why? 

(iii) Observe that C = DBqD\ where D = diag(zi, . . . , rr n ). Hence 
Bq and C represent the same quadratic function. 

(iv) det C = ITX,, where the \ t are real and positive. But the 
geometric mean of any n positive real numbers never exceeds their 
arithmetic mean. 

(v) Show that c n = 1 for all i. 

9. Since Cj is the coefficient of \ n ~ } in the expansion of det (A — XT'), 
consider any choice of n — j diagonal positions. Let D, denote the 
determinant of the j X j matrix obtained by deleting from A the rows 
and columns corresponding to the n — j diagonal positions. Then c 3 is 
the signed sum of all the D> obtained from different choices of the 


diagonal positions. There are 
inequality, |D,| < j ]/2 k 



such choices and, by Hadamard's 


§ 8.6. 2. Adapt the arguments of Theorems 8.35 and 8.36. 

3. Use Theorem 8.36. 

6. Choose a normal orthogonal basis {a t } and calculate the two equal 
expressions p(a t , a/T*) and p(a t T, a ; ). 

7. (ii) First show that if T is normal, so is T — XI. Then compute 
p(f(T — XI), $(T — XI)). This also solves (iii). 

(iv) Show that Xip(fi, f 2 ) = X 2 p(fi, { 2 ), using (ii) and (iii). 


CHAPTER 9 

§9.2. 2. (i) Min = 12, max = 21. 

(ii) Min = 20, no max. 

3. (i) The inequality i a l x l > b can be expressed in vector form 

by <*•$ > b. Let £ and rj be solutions, and let f = + (1 — k) tj, where 

0 < k < 1. Compute. 

4. (i) For any line segment L C C, let £, ij G L, and let f lie be- 
tween £ and on L; then f = fc£ + (1 — k)rj for some k } 0 < k < 1. 
Then /(f) » yf is a convex linear combination of the numbers /(£) 
and /(??). 
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VI 

o 

gold 

X] 

silver 

*2 

> 

ring y x 

J 

0 

5 

pin 2/2 



4 

earring 2/3 
< 

0 

1 

3 





30 

60 

max 




min 




s 


(ii) xi > 10, Xi > 3, X\ + 2x 2 > 1C. 

(iii) Min v = 480. I 

(iv) 00 rings, no pins, 00 pairs of earrings. 

3. (ii) The linear form is maximal at 2/1 = 1.4, y 2 = 2.4. \ 

(iii) The cost is 175 if Ji = = 0, = .75, x 4 = .25. 

§9.4. 2. (iv) Show that (P rg B)* = P r »B* f and use (ii) and (i), recognizing 

that P T8 = Par. 

(v) Show that ( BP rg )f r = Bf s P rt) and use (iii) and (i). 

3. (ii) A- - i (J -*). 

6. R coincides with I except for the entries in column i, which are of 
the form — a,^ 1 in row r ^ i t and a T ~] 1 in row i. 


CHAPTER 10 

5 10.1. 2. (i) - (J" °). 


(ii) Diagonalize A to obtain PAP~ l = J = (^ — 2 )’ 

. p . /e 2 + 3e -2 3e 2 - 3e" 2 \ 

6 = Pe J P = } ( c2 _ e _ 2 3c2 + e _ 2 )• 

(iii) Use the result of (ii), together with Theorem 10.3. 


2 3\ 

2 2 ]. Observe that A is nilpotent. 

\0 0 2 / 

4. (i) Characteristic values: Xi = 1, X 2 = 2, X 3 = 3. 

/ — a + 26 —2a + 26 — 2a + 26 \ 

f{A) = — 6 + c — 6 + 2c — ■ 6 + c J, 

a — c 2a — 2c 2a — cj 

where a « /(l), 6 = /(2), c * /(3). 
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(iii) Necessary and sufficient conditions are /( 1) = /( 2) = /( 3). 

( -e + 2c 2 + 0 -2c + 2c 2 + 0 -2c + 2c 2 + 0\ 

0 - e 2 + c 3 0 - c 2 + 2c 3 0 - c 2 + e 9 V 

e + 0 - e 8 2c + 0 - 2c 8 2c + 0 - e 8 / 

§ 10.2. 5. f t det X(t) = det(A r !, X t , . . . , X n ) + det(X,, X\ . . . , X n ) + 

• • ■ + det(Xi, X 2 , . . . , X n ) f where X, denotes column i of X(t). 

0 0-1 

§10.3. 1. Y = FA, where A 10 1 

0 1 1 

The characteristic values of A are 1, 1, —1, and A is similar to 

1 1 0 /«■-* (t - uy-" 0 \ 

J = l 0 1 0 1; 0 e*~* 0 j 

0 0-1 \ 0 e-‘+V 

APPENDIX A 

§ A.l. 1. mi7n 2 . . . m n elements. 

3. (ii) m 2 elements. 

§ A.2. 1. 2 mn relations. 

2. (ii) {(a, b) e / X / | b = ka for some k e /}. 

§ A.3. 1. F = G means that dom F = dom G, and xF = xG for all 

x e dom F. 

2. (n + l) m — 1 functions. 

5. FF* = I A ) F*F - l B . 

§ A.4. 2. n (n,) binary operations. 

3. (i) The graph of a relation R on the real numbers is the set of 

all points (a, b) in the real plane such that a R b. 

§ A.7. 2. (i) If both i and e are identity elements, then i = i * e = e. 

4. First show that x is also a left inverse of x\ then show that c is 
also a left identity. 

5. Use Exericse 4. 

7. Consider 90° rotations about two fixed perpendicular axes, per- 
formed in each of the two possible orders. 

9. The systems are isomorphic. 

§ A.8. 1. (i) An isomorphism. 

(ii) Neither. 

(iii) A homomorphism onto. 

6. Yes. The mapping x — >- 2 X , for example, is an isomorphism. 
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§ A.9. 1. R is reflexive if dom R = A, but not otherwise. 

2. Each equivalence class is a line of slope m, and each such line 
is an equivalence class. 

4. E{ 4) = 5. 

§ A. 10. 1. (i) (o) is the set of all real numbers. 

(ii) (a) is the set of all positive integers. 

(iii) (L) is the set of all lines parallel to L. 

3. (i) (a) n = {x e / | x = kn + a for some fee/}. 

(iv) To show that each (a)„, a ^ 0, has an inverse if n is prime, 

consider (l)„(fc)», (2 )»(fc) n , . . . , (n - l)„(/c)„ for any fixed k = 1 , . . . , 
n — 1. These n — 1 cosets can be proved to be distinct from one another 
and from (0) n . Hence one of them must be (1)„, which means thalt 
(A;)„ has a multiplicative inverse. \ 



INDEX 


Abstract systems, 1-20, 256-268; homo- 
morphism of, 260-268; isomorphism of, 
260-268 

Adjoint: determinant of, 111; method for 
A~ l , 109; of a linear transformation, 181, 
214-217; of a matrix, 108-109 
Alias interpretation, 71 
Alibi interpretation, 71 
Angle, 177, 192-199 
Annihilator, 63 

Associativity, 7, 12, 14-18, 23, 48, 51, 66, 
258-259 


Basis: change of, 133-136, 180, 202; cyclic, 
151 , 161 ; dual, 61, 82 ; for idem potent trans- 
formations, 73; for nilpotent transfor- 
mations, 161-162; normal orthogonal, 
194-199, 212-215; of null space, 53; or- 
thogonal, 193- 200, 206-217 
Bessel’s inequality, 196 
Bidual space, 61 
Bilinear form, conjugate, 179 
Bilinear function, 178-189, 201, 272-273 
Bilinearity, 66, 89 

Binary : operation, 13, 255; relation, 8, 252 
Binomial coefficient, 241 
Block multiplication, 91-95 
Boolean algebra, 256-258 
Brand, Louis, 172 

Canonical form, 119; for congruence, 182- 
1 85 ; for conjunctivity, 1 83-184 ; for equiv- 
alence, 130, 132; for Hermitian matrices, 
183-185; for idempotent matrices, 138; 
Jordan, 166-172; for nilpotent matrices, 
164; rational, 152; for row equivalence, 
128; .for similarity, 139-176; for skew 
symmetric matrices, 186-187; for sym- 
ihetric matrices, 182-185 
Cartesian product, 8, 178, 251 
Cauchy inequality, 190 
Cayley, A., vii 

Cavley-Hamilton theorem. See Hamilton- 
Cayley theorem 
Cayley’s theorem, 262 
Characteristic: equation, 142, 248; numbers, 
141; polynomial, 142-176, 206; roots, 
141; subspace, 142-143; values, 139-176, 
200, 205-208, 212-217, 240-243, 248; 
vectors, 139-176, 200, 205, 2i2-217 


Closed operations, 13, 255 
Cofactor, 106 

Column: index, 75; vector, 83, 88, 101, 196, 
227 

Combinatorial equivalence, 115, 229-236 
Commutative group, 258 
Commutativity, 15-20, 80, 212 
Companion matrix, 152, 172 
Complement of a set, 6 
Complex n tl1 roots of unity, 137, 159, 259 
Complex numbers, 177-178, 185 
Congruence of integers, 9, 267 
Congruence of matrices, 180-187, 202, 206- 
208 

Conjugate: bilinearity, 66, 89, 177-189, 
201; of a complex matrix, 177-178; of a 
complex number, 185; operations, 181; 
transpose of a matrix, 181, 186 
Conjunctivity of matrices, 180-187, 202, 
207-208 

Consistency condition, 99 
Continuity, 23, 191, 244 
Convergence, 237- 243 
Convex, 221, 223 

Coordinate system, 38. See also Basis 
Cosets, 32, 264-268 
Cramer’s rule, 110 
Cyclic subspace, 151, 161 

Dantzig, O. B., 229 
Derivative of a matrix, 244 
Determinant, 100-108, 111, 141, 246 
Diagonability, 143-147, 155-159, 182-185, 
206-214 

Diagonal matrix, 80-81, 84, 182-185 
Difference of sets, 7 
Differential equation, 27, 245-249 
Dimension: of a linear algebra, 66; of a vec- 
tor space, 37, 40-41 
Direction numbers, 194 
Direct sum, 31-32, 39-40, 151, 161-163, 
167-168 
Disjoint sets, 6 
Distance, 192, 195-198 
Distributivity, 15, 18, 257 
Division algebra, 68 
Division ring, 25 
Domain, 10, 252, 254 
Dot product, 62, 187-188, 194 
Dual: basis, 61, 82; problem, 225-229; 
space, 44-45, 57-65, 82, 215-217, 225-226 
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Duality: in Boolean algebra, 258; in linear 
programming,- 225-229; theorem, 228 

Echelon form, 123; reduced, 124-128 
Eigenvalues, 141 
Eigenvectors, 141 

Element, 4, 12-13; idem potent, 15; identity, 
15, 258; inverse, 15, 258 
Elementary divisors, 171 
Elementary matrices, 120-122, 181, 230 
Elementary operations on matrices, 1 JO- 
124, 129, 181, 230 

Elementary operations on systems of equa- 
tions, 113 

Equivalence classes, 119, 203-206 
Equivalence relations, 13, 119, 202-203; 
combinatorial equivalence, 115, 229*230; 
congruence, 9, 180-187, 202, 200-208, 
207; conjunctivity, 180-187, 202, 207- 
208; equivalence, 128-130, 179; row 

equivalence, 122-128; similarity, 132— 
176, 200 

Equivalent matrices, 128-130, 179 
Equivalent systems of equations, 112, 128, 
230 

Euclidean group, 197 
Euclidean space, 189-200, 214-21S 


Feasible vector, 227-228 
Field, 17-20, 25-26, 139; finite, 19; of sca- 
lar matrices, 79, 84 ; as a vector space, 25- 
20 

Finite dimension, 37 
Finite field, 19 
Finite geometry, 3 
Fixed point, 140 

Form: conjugate bilinear, 179; Herrnitian, 
201-210; quadratic, 202-210 
Forsythe, G. 10., Ill 
Frohenius, 08 

Function, 9-12, 254-255, 264-208; coset of 
a, 204-208; domain of a, 10, 254; of a 
function, 11-12; inverse, 11, 254; of a 
matrix, 239-243; range of a, 10, 254; re- 
versible, 11, 254; value of a, 9 
Functions: bilinear, 178-189, 201; conju- 
gate bilinear, 177-189, 201; Herrnitian, 
201-211; matrix of, 243-240; product of, 
12; quadratic, 201-211; space of contin- 
uous, 23, 191 


Gram-Schmidt orthogonalization, 193, 197 
Graut, P. J. L., 232 

Group, 258-260; commutative, 258; Euclid- 
ean, 197; full linear, 58; Lorentz, 91; of 
matrices, 84; Pauli, 91; unitary, 197 


Hadnmard's inequality, 211 
Half-space, 221 , 223 
Hamilton, W. 1L, vii, 07 
Hamilton-Cavley theorem, 148-149, 152- 
153, 173, 239 

Herrnitian: congruence, see Conjunctivity; 
form, 201, 209; function, 201-204; matrix, 
181-187, 202-211, 217; skew, 181, 180- 


187, 203, 210; symmetry, 189; transfor- 
mation, 215 

Homogeneous: differential equation, 246; 
system, 97-99 

Homomorphism: of abstract systems, 260- 
202, 200-207; kernel of a, 52, 268; of vec- 
tor spaces, 40, 51 
Huntingdon, E. V., 257 
Hyperplane, 03 

Idempotent: element, 15-10; linear trans- 
formation, 50, 73, 87, 153-159, 175; ma- 
trix, 81, 84, 87, 138, 159, 175 
Identity: element, 15-18, 257-202, 268; 
linear transformation, 48-51; matrix, 79- 
80; right, 259 

Independence. Sea Linear independence , 
Inequality: Bessels, 190; Cauchy, 190; 
Hadanmrd’s, 211; Schwarz, 189-192; tri- 
angle, 191-192 \ 

Infinite-dimensional, 37-38 \ 

Inner product, 187-191, 217; space, 189-218 
Integral of a matrix, 244 
Intersection of sets, 0, 256 
Intersection of subspaces, 29 
Invariant spaces, 54, 150-155, 160-163, 
107-108 

Inverse: element, 15-18, 258-259; of a lin- 
ear transformation, 55-58; of a mapping, 
10-12, 254; of a matrix, 82; right, 259 
Inverse of a matrix, calculation of: by ad- 
joint method, 108-1 1 1 ; by Hamilton-Cay- 
iov theorem, 173-175; for an orthogonal 
or unitary matrix, 199; by pivot opera- 
tions, 114-117, 235-230; by row and col- 
umn operation, 130-132; by row opera- 
tions, 125-126 

Isometric diagonabilitv, 212-214 
Isometry, 197-200, 200-218 
Isomorphism: of abstract systems, 260-20S; 
of matrices and linear transformations, 
85-87; of vector spaces, 42-44, 02 

Jordan canonical form, 159, 160-173, 239- 
242, 248 

Kernel of a homomorphism, 52, 208 
Kronecker delta, 00 

Latent roots, 141 
Law of cosines, 195 

Left hand notation, 10-12, 75, 97, 269-273 
Length, 177, 188, 191-198 
Linear algebras, 29, 65-08, 86-87 
Linear combinations, 29 
Linear dependence, 33-36 
Linear equations, 90-100, 109-117, 120-131, 
234-235; consistency condition for, 99; 
equivalence of, 112, 128, 234-235; homo- 
geneous, 97; nonhomogeneous, 98; opera- 
tions on, 111-117, 229-230; solution by 
Cramer’s rule, 110; solution of, 96 
Linear functional, 44-45, 58-65, 82-83, 215- 
217, 225-226 
Linear group, 58 

Linear independence, 33-41, 54, 60, 146, 
151, 155-157, 161, 193. 
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Linear inequalities, 220-223 
Linear programming, 224-229 
Linear space. See Vector space 
Linear transformation, 42-74, 84-87, 132- 
136, 153-175, 197-200, 206-207, 211-217, 
240; adjoint of, 214-217; characteristic 
values of, 139-143, 158, 167-171, 174, 200, 
217; characteristic vectors of, 139-143, 
200, 215, 217; Hermitian, 215; idempo- 
tent, 50, 73, 87, 153-159, 175; identity, 
48-49; inverse of, 57; isometric, 197-199, 
206-207, 211-212; Lorentz, 91; matrix 
representation of, 67-75, 85-88, 132-136; 
negative, 48-49; nilpotent, 50, 54, 87, 
159-168, 240; nonsingular, 55-58, 65, 167; 
normal, 215-217; null space of, 51-58, 94, 
167; nullity of, 51-58; orthogonal, 197- 
199, 206; range space of, 51-58, 126, 167; 
rank of, 51-58, 65; restricted to a sub- 
space, 150, 167-168; Segre characteristic 
of, 170; self adjoint, 215; specific form of, 
68-73; spectral form of, 158; spectrum of, 
141; symmetric, 215; trace of, 174-175; 
transpose of, 63-65, 82-83, 215-216; uni- 
tary, 197-199, 207; zero, 48-49 
Linear transformations: equality of, 47; 
product of, 48; scalar multiple of, 48; sum 
of, 48 

Lorentz matrices, 91 

MacDuffee, C. C., 79 
Main diagonal of a matrix, 81 
Mappings. See Functions, Linear transfor- 
mations 

Markov matrix, 90, 143 
Matrices: block multiplication of, 91-95; 
combinatorial equivalence of, 229-236; 
congruence of, 180-187, 206-208; con- 
junctivity of, 180-187, 207-208; equality 
of, 76; equivalence of, 128-136, 179; prod- 
uct of, 77-78; row equivalence of, 122- 
128; scalar multiple of, 76; sequences of, 
237-238; similarity of, 132-176, 206; sum 
of, 76 

Matrix: adjoint of a, 108-109; augmented, 
98-99; characteristic equation of a, 142, 
248; characteristic polynomial of a, 142- 
176, 206; characteristic values of a, 139— 
176, 205-217, 240-248; characteristic vec- 
tors of a, 139-176, 205-217; conjugate of 
a, 177-178, 181, 186; continuity of a, 244; 
derivative of a, 244; determinant of a, 
100-108; diagonability of a, 143-147, 155- 
159, 182-185, 206-214; diagonal, 80-81, 
182-185; elementary, 120-122, 181, 230; 
elementary divisors of a, 171 ; elementary 
operations on a, 119-124, 129, 181, 230; 
of functions, 243, 249; functions of a, 239- 
243; Hermitian, 181-187, 202-211, 217; 
idempotent, 81, 84, 87, 138, 159, 175: 
identity, 79-80; impedance, 137; integral 
of a, 214; inverse of a, 82, 108-111, 1 14— 
117, 125-126, 130-132, 173-175, 199, 235- 
236; isometric diagonability of a, 212-214; 
Jordan form of a, 159, 166-173, 239-248; 
Lorentz, 91; main diagonal of a, 80; Mar- 
kov, 90, 143; minimal polynomial of, 148- 


150, 171, 206; nilpotent, 81, 84, 87, 147, 
159-171, 175, 240; nonsingular, 81-82, 
89-90, 143, 217-218; normal, 212-213, 
217; notation, 269-273; orthogonal, 198- 
200, 206-218; Pauli, 90-91; permutation, 
120-122, 129, 230-236; pivot operation on 
a, 113-117, 229-236; polynomial, 148- 
149; rank of a, 88-91, 130; row operations 
on a, 119-124, 129, 181, 230; scalar, 80; 
Segre characteristic of a, 170; signature of 
a, 203-204; singular, 81, 84, 108, 111; 
skew Hermitian, 181, 186, 210; skew- 
symmetric, 83-85, 111, 186-187, 200, 203: 
spectral form of a, 154-158; spectrum of 
a, 141; strictly triangular, 81, 84; sym- 
metric, 83-85, 132, 182-187, 201-209: 
trace of a, 174-175, 242, 246; transpose of 
a, 82-84, 96-97, 132, 176, 178-186, 199, 
203-218, 235, 269-273; triangular, 81, 84, 
90, 111, 116, 212-213; unitary, 198-200, 
207-218; Vandermonde, 108, 111; zero, 
77 

Member of a set, 4, 12-13 
Metric, 205 
Metric space, 192 

Minimal polynomial, 148-150, 171, 206 
Monic polynomial, 148 

Nilpotent. linear transformations, 50, 54, 87, 
159-168, 240 

Nilpotent matrices, 81, 84, 87. 147, 159-171, 
175, 240 

Nonhoinogenoous system, 98 
Nonsingular linear transformation, 55-58, 
65, 167 

Nonsingular matrix, 89-90, 143 
Normal matrix, 212 

Normal orthogonal basis, 194-200, 206-217 
Normal transformation, 215-217 
Normal vector, 194 
nth roots of unity, 137, 159, 259 
n-tuples, ordered, 38, 251 
n-tuples, space of, 26, 36, 38, 43 
Null space of a linear transformation, 51-58, 
126, 167 

Nullity of a linear transformation, 51-58 

Operation: binary, 13-14, 255-258; closed, 
13, 16-17, 255, 258; elementary, 113, 120- 
129, 230; n-ary, 255-256; pivot, 114-117; 
unary, 13, 256 
Optimal vector, 228 
Ordered n-tuple, 38, 251-252 
Ordered pair, 8, 21 

Orthogonal: basis, 193-200, 206-207; oom- 
lepment of a subspace, 194, 197; matrix, 
198-200, 206-218; projection, 154-155, 
194, 197; transformation, 197-199, 206; 
vectors, 192-193, 200, 206, 217 
Orthogonality, Gram-Schmidt process for, 
193, 197 

Orthonormal, 194 

Parallelogram principle, 22 
Parse val’s identity, 196 
Partial ordering, 222, 253 
Partition, 263-264 
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Pauli matrices, 90-91 
Permutation, 104 

Permutation matrix, 120-122, 129, 230-236 
Perpendicularity. See Orthogonality 
Pivot operations, 113-117, 229-236 
Polar decomposition, 217-218 
Polynomials: characteristic, 142-176, 206; 
minimal, 148-150, 171, 206; monic, 148; 
space of, 26, 27, 32, 38, 44, 50-51, 58-59, 
63 

Positive definite, 189, 204, 209-211, 218 
Postulates, 13-15, 256; for a Boolean alge- 
bra, 257; for a field, 17-18; for a group, 
258; for a linear algebra, 65-66; for a vec- 
tor space, 25 

Principal axes theorems, 206-207 
Product: of cosets, 266-268; dot, 187-188, 
194; inner, 187-191, 217; of linear trans- 
formations, 48, 69-73; of mappings, 12; 
of matrices, 77-78 

Projections, 45, 50, 73, 153-155, 158-159, 
175-176, 194, 197; orthogonal, 154-155, 
194; supplementary, 154-155 
Proper states, 141 
Proper values, 141 
Proper vectors, 141 
Pythagorean theorem, 196 

Quadratic form, 202-210; positive definite, 
204, 209-210; rank of, 203; real symmet- 
ric, 203-210; signature of, 204 
Quadratic function, 203-210 
Quadric surface, 209-21 1 
Quaternions, 67-68, 79 
Quotient space, 33, 55, 58 
Quotient, system, 267 

Range, 10, 252, 254 

Range space of a linear transformation, 51- 
58, 126, 167 

Rank: of a conjugate bilinear form, 180; of 
a conjugate bilinear function, 180; of a 
linear transformation, 51-58, 65; of a ma- 
trix, 88-91, 130; of a quadratic form, 203 
Rational canonical form, 152 
Reflexive relation, 119, 262-263 
Relation, 8-9, 252-253, 262-263; binary, 8, 
252; cosets of a, 264; domain of a, 252; 
equivalence, 13, 119, 262-263; induced, 
264 ; range of a, 252 ; reflexive, 1 1 9, 262- 
263; symmetric, 119, 262-263; transitive, 
119, 262-263 

Right hand notation, 10-12, 75, 97, 269-273 
Rigid motion. See Isometry 
Rotations, 49, 70-71, 200, 259-260 
Row: equivalence of matrices, 122-128; 
index, 75; operations on matrices, 119- 
124, 129, 181, 230; vector, 83, 88-89, 196, 
198-200 

Scalar: matrix, 80, 84, 243; multiple of a 
linear transformation, 47-48; multiple of 
a matrix, 76; multiple of a vector, 22-23, 
25; polynomial, 148; product. See Inner 
product 

Scalars, 25; representing a linear transfor- 
mation, 67-75, 85-88, 132-136 


Schwarz inequality, 189-192 
Segrc characteristic, 170 
Self-adjoint: matrix, 181; transformation, 
215 

Sequence of matrices, 237-243 
Sequence of numbers, 237 
Series: of matrices, 238-243; of numbers, 
237; power, 238-243; Taylor, 210, 237 
Set, 3-7; subset of a, 5; void, 4 
Sets, 3-7, 251-252, 256-257; cartesian prod- 
uct of, 8, 251-252; complementation of, 
6-7; difference of, 7; disjoint, 6; equality 
of, 5; intersection of, 6-7; operations on, 
6, 13; symmetric difference of, 7; union of, 
6-7 

Signature of a quadratic or Hermitian func- 
tion, 204 i 

Similarity, canonical form for, 139-176 ; 
Similarity of matrices, 132-138, 144-1^7, 
155-159, 163-165, 168-176, 199, 206-208, 
211—213 V 

Singular matrix, 81, 84, 108, 111 ^ 

Skew-Hermitian matrix, 181, 186, 210 
Skew-symmetric matrix, 83-85, 111, 186- 
187, 200, 203 
Space. See Vector space 
Spectral form, 154-158 
Spectrum, 141 

Stochastic matrix. See Markov matrix 
Stone, M. H., 257 
Submatrix, 124 
Subsets, 5-6, 13 

Subspace, 27-33; annihilator, 63; charac- 
teristic, 142-143; cosets of, 32; cyclic, 151, 
161; invariant, 54, 150-155, 160-163, 
167-168; mapping of, 51 ; null, 51-58, 126, 
167; orthogonal complement, 194, 197; 
projection on a, 45; range, 51-58, 126, 
167; spanned by vectors, 29 
Subspaces, 27-33; dimensions of, 41; direct 
sum of, 31-32; intersection of, 29; sum of, 
29 

Sum: direct, 31-32; of linear transforma- 
tions, 48; of matrices, 76; of subspaces, 29 
Superdiagonal of a matrix, 159 
Supplementary projection, 154-155 
Sylvester, J. J., vii 

Symmetric: difference of sets, 7; inner pro- 
duct, 189; matrix, 83-85, 132, 182-187, 
201-209; relation, 119, 262-263 
Symmetry, conjugate or Hermitian, 181 
System of linear equations, 96-100; aug- 
mented matrix of, 98-99; consistent, 99; 
equivalence of, 112, 128, 230; homoge- 
neous, 97-98; matrix of, 96; nonhomoge- 
neous, 98-99; operations on, 111-117; 
solution by Cramer’s rule, 110; solution 
by pivot operations, 113-117; solution of, 
96 

Taylor expansion, 210, 237 
Trace of a linear transformation, 174 
Trace of a matrix, 174-175, 242, 246 
Transitive relation, 119, 262*-263 
Transpose of a matrix, 82-84, 96-97, 132, 
176, 178-186, 199, 203-218, 235, 269-273 
Transposition, 103, 232-233 
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Triangle inequality, 191 
Triangular matrix, 81, 84, 90, 111, 166, 212- 
213; strictly, 81, 84 
Trotter, H. r ., 161 
Tucker, A. W., 227, 229 


Union of sets, 6, 256 

Unitary: matrix, 198-200, 207-218; space, 
189-199, 214-218; transformation, 197- 
199, 207 


Vandermonde matrix, 108, 111 
Vector: addition, 22—25 ; characteristic, 139- 
176, 200, 205, 212-217; column, 83, 89, 
101, 196, 227; components, 21; length of, 
188, 191-198; matrix representation of, 
83-84; normal, 194; orthogonal, 192; row, 


83, 88-89, 196, 198-200; scalar multiple 
of, 25 

Vector space: of continuous functions, 23, 
32, 189; dimension of, 37, 40-41 ; dual, 44- 
45, 57-63; Euclidean, 189-200, 214-218; 
inner product, 189-218; of n-tuples, 26, 
36, 38, 43; of polynomials, 26, 27, 32, 38, 
44, 50—51, 58-59, 63; of scalars, 26; of 
solutions of a differential equation, 27; 
subspace of, 27-33; unitary, 189-199, 
214-218. See also Subspacc 
Vector spaces, direct sum of, 31-32, 39-40, 
151, 161-163, 167-168 
Venn diagrams, 5-6 
Void set, 4 

Zero: mapping, 12; matrix, 77 ; subspace, 28; 
transformation, 48-49; vector 25 







